dbpedia / dbpedia-links

moved to https://github.com/dbpedia/links
13 stars 28 forks source link

Clean up bookmashup #12

Closed kurzum closed 9 years ago

kurzum commented 11 years ago

We agreed on the mailing liist, that owl:sameAs is not appropriate any more and should be something weaker, i.e. rdfs:seeAlso and additionally a domain specific property.

cgutteridge commented 11 years ago

rdfs:seeAlso is pretty dang vague as it makes no connection between the two resources except that the target resource may contain some more information about the source resource. It doesn't actually state a connection between the resources beyond that.

I think SKOS is the place to look...

It defines skos:closeMatch and skos:exactMatch which may be what you need.

On 07/05/13 15:54, kurzum wrote:

We agreed on the mailing liist, that owl:sameAs is not appropriate any more and should be something weaker, i.e. rdfs:seeAlso and additionally a domain specific property.

— Reply to this email directly or view it on GitHub https://github.com/dbpedia/dbpedia-links/issues/12.Web Bug from https://github.com/notifications/beacon/5950KnHMpOeia1OBEjyR59JH76GBwSRuwS30m0MNxejkVsPqJhoEjWrwSWOayXnt.gif

Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/ You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/

Would you recommend the software you use to another institution? http://uni-software.ideascale.com/

kurzum commented 11 years ago

I have some problems with the domain and ranges of skos. basically every DBpedia instance will become a skos:Concept by using these properties. What are we doing against this? maybe another property?

cgutteridge commented 11 years ago

I don't understand what the concern is with using skos, yeah it has some semantics, but does that matter?

owl:sameAs isn't so bad if it's interpreted as "the creator of this dataset asserts that A and B should be considered the same thing, for the purposes of this dataset."

it's not pure logic, there's a degree of context involved.

On 07/05/13 16:17, kurzum wrote:

I have some problems with the domain and ranges of skos. basically every DBpedia instance will become a skos:Concept by using these properties. What are we doing against this? maybe another property?

— Reply to this email directly or view it on GitHub https://github.com/dbpedia/dbpedia-links/issues/12#issuecomment-17548771.Web Bug from https://github.com/notifications/beacon/5950KnHMpOeia1OBEjyR59JH76GBwSRuwS30m0MNxejkVsPqJhoEjWrwSWOayXnt.gif

Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/ You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/

Would you recommend the software you use to another institution? http://uni-software.ideascale.com/

kurzum commented 11 years ago

owl:sameAs -> sorry, but there are definitely standardized Semantics for this. We are not free to interpret it differently as wrong usage will break the Semantic Web tool chain. Books in bookmashup are foaf:Document so linked DBpedia should also be compatible with foaf:Document. Otherwise we should not link with owl:sameAs .

skos -> I guess skos:closeMatch is fine as well. In the skos world all DBpedia resources are Concepts anyhow... So why not make this explicit. It is not hurting. What do you think about "closeMatch" or "exactMatch" ?

cgutteridge commented 11 years ago

I'll see if I can get some expert advice, but my personal opinion is to ensure that linked data works and is useful and not loose much sleep over the semantic layer.

On 07/05/2013 23:56, kurzum wrote:

owl:sameAs -> sorry, but there are definitely standardized Semantics for this. We are not free to interpret it differently as wrong usage will break the Semantic Web tool chain. Books in bookmashup are foaf:Document so linked DBpedia should also be compatible with foaf:Document. Otherwise we should not link with owl:sameAs . I guess skos:closeMatch is fine as well. In the skos world all DBpedia resources are Concepts anyhow... So why not make this explicit. It is not hurting. What do you think about "closeMatch" or "exactMatch" ?

— Reply to this email directly or view it on GitHub https://github.com/dbpedia/dbpedia-links/issues/12#issuecomment-17576080.Web Bug from https://github.com/notifications/beacon/5950KnHMpOeia1OBEjyR59JH76GBwSRuwS30m0MNxejkVsPqJhoEjWrwSWOayXnt.gif

Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/ You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/

sorenroug commented 11 years ago

I'll see if I can get some expert advice, but my personal opinion is to ensure that linked data works and is useful and not loose much sleep over the semantic layer.

Then we should hang a big sign on DBPedia warning. "Do not load this data into a database that understands semantics".

Actually, skos:closeMatch and skos:exactMatch have no rdfs:domain or rdfs:range attributes. It was a conscious decision by the SKOS authors. They are also reserved for links between concepts of different schemes - i.e. different databases.

The effect of an owl:sameAs link is to effectively merge two concepts. If A and B are linked with owl:sameAs, then all the attributes on object B will be seen when you query A and vice versa.

kurzum commented 11 years ago

"loose much sleep over the semantic layer" -> that is exactly my attitude as well. Why not use rdfs:seeAlso. If you ignore the semantic layer completely, then the difference between "seeAlso" and "sameAs" is Levenshtein distance of 4 ;)

cgutteridge commented 11 years ago

Although I agree that dbpedia itself shouldn't be asserting same-ness. It's (IMO) OK for a standalone linkset to say a bunch of wikipedia resources should be considered the same as a bunch of other URIs but that should not be an intergral part of dbpedia.

An interesting possibility would be to make skos:closeMatch an integral part, plus a standalone linkset which contains owl:sameAs versions of each relation if people do want to go the sameAs route.

I've asked the maintainer of sameAs.org if he's got any input on the subject.

ps. I only recently learned about Levenshtein distances. I used them in this nifty tool: http://graphite.ecs.soton.ac.uk/checker/ to see if your namespace is nearly-but-not-quite one of the common ones (catches common typos)

On 08/05/2013 08:14, kurzum wrote:

"loose much sleep over the semantic layer" -> that is exactly my attitude as well. Why not use rdfs:seeAlso. If you ignore the semantic layer completely, then the difference between "seeAlso" and "sameAs" is Levenshtein distance of 4 ;)

— Reply to this email directly or view it on GitHub https://github.com/dbpedia/dbpedia-links/issues/12#issuecomment-17590257.Web Bug from https://github.com/notifications/beacon/5950KnHMpOeia1OBEjyR59JH76GBwSRuwS30m0MNxejkVsPqJhoEjWrwSWOayXnt.gif

Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/ You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/

acka47 commented 11 years ago

I also see no problem with skos:closeMatch and skos:exactMatch as they haven't rdfs:domain or range defined. But I'd prefer rdfs:seeAlso...

cgutteridge commented 11 years ago

My concern with rdfs:seeAlso is that it implies there is more description of the same resource.

If you go from dbpedia:Cheese rdfs:seeAlso fooddb:Cheese . then unless fooddb:Cheese states a sameAs then it's not very helpful.

On 08/05/2013 09:13, Adrian Pohl wrote:

I also see no problem with skos:closeMatch and skos:exactMatch as they haven't rdfs:domain or range defined. But I'd prefer rdfs:seeAlso...

— Reply to this email directly or view it on GitHub https://github.com/dbpedia/dbpedia-links/issues/12#issuecomment-17592140.Web Bug from https://github.com/notifications/beacon/5950KnHMpOeia1OBEjyR59JH76GBwSRuwS30m0MNxejkVsPqJhoEjWrwSWOayXnt.gif

Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/ You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/

kurzum commented 11 years ago

We can basically define a best practice for DBpedia to use skos:closeMatch , For anybody who things it should be stricter, we can recommend SPARQL Construct: CONSTRUCT {?s owl:sameAs ?p } WHERE {?s skos:closeMatch ?o } This should do it and compromise between owl:sameAs and rdfs:seeAlso

tgra commented 11 years ago

have you considered using prism:hasAlternative?

http://www.prismstandard.org/specifications/2.1/PRISM_prism_namespace_2.1.pdf http://en.wikipedia.org/wiki/Publishing_Requirements_for_Industry_Standard_Metadata

text from the above file:

Name Has Alternative

Identifier prism:hasAlternative

Definition Identifies an alternative resource in case the current resource cannot be used (typically because of rights restrictions) or there is a platform-based alternative. Occurrence Occurs 0 or more times

Comment Identifies another resource that can be substituted in place of the current resource. This provides a means for avoiding unsightly things like printing blank rectangles containing "No rights to reproduce this image". It also allows for relating content that differs intellectually when delivered on alternate platforms.

Identifies another resource that can be substituted in place of the current resource. This provides a means for avoiding unsightly things like printing blank rectangles containing "No rights to reproduce this image". It also allows for relating content that differs intellectually when delivered on alternate platforms.

Alternatives are not simply a reformatting of the original work; they are a separate intellectual work. To point to alternatives which are a different resolution, color space, file format, or different delivery platform etc. see dc:hasFormat. For alternatives which are newer or older versions of the same intellectual work, see dcterms:hasVersion. As an example, imagine a publisher distributing an article containing a stock photo to which they did not secure Brazilian rights. If the publisher sent the article to Brazil, they might describe the original image that was published, but suggest an alternative to their syndication partners using prism:hasAlternative.

sorenroug commented 11 years ago

We can recommend such a SPARQL construct, but I don’t think anybody would want to do that. The effect of an owl:sameAs link is to effectively merge two concepts. If A and B are linked with owl:sameAs, then all the attributes on object B will be seen when you query A and vice versa. Owl:sameAs is both commutative and transitive. If: A skos:closeMatch B B skos:closeMatch C C skos:closeMatch D

Then A owl:sameAs B, C and D and vice versa.

Best regards, Søren Roug

From: kurzum [mailto:notifications@github.com] Sent: 08 May 2013 10:29 To: dbpedia/dbpedia-links Cc: Søren Roug Subject: Re: [dbpedia-links] Clean up bookmashup (#12)

We can basically define a best practice for DBpedia to use skos:closeMatch , For anybody who things it should be stricter, we can recommend SPARQL Construct: CONSTRUCT {?s owl:sameAs ?p } WHERE {?s skos:closeMatch ?o } This should do it and compromise between owl:sameAs and rdfs:seeAlso

— Reply to this email directly or view it on GitHubhttps://github.com/dbpedia/dbpedia-links/issues/12#issuecomment-17592750.

kurzum commented 11 years ago

No, I meant, in case somebody wants to have owl:sameAs instead of skos:closeMatch, then he can use the construct query. The issue whether to use skos:closeMatch or rdfs:seeAlso doesn't seem to find consensus. I am more for skos:closeMatch , but in the end I don't have a strong opinion about this.

cgutteridge commented 11 years ago

The feedback I got from colleagues at Southampton was that skos:closeMatch is most widely used when people want to avoid asserting full-on sameAs.

On 13/05/13 13:29, kurzum wrote:

No, I meant, in case somebody wants to have owl:sameAs instead of skos:closeMatch, then he can use the construct query. The issue whether to use skos:closeMatch or rdfs:seeAlso doesn't seem to find consensus. I am more for skos:closeMatch , but in the end I don't have a strong opinion about this.

— Reply to this email directly or view it on GitHub https://github.com/dbpedia/dbpedia-links/issues/12#issuecomment-17808500.Web Bug from https://github.com/notifications/beacon/5950KnHMpOeia1OBEjyR59JH76GBwSRuwS30m0MNxejkVsPqJhoEjWrwSWOayXnt.gif

Christopher Gutteridge -- http://users.ecs.soton.ac.uk/cjg

University of Southampton Open Data Service: http://data.southampton.ac.uk/ You should read the ECS Web Team blog: http://blogs.ecs.soton.ac.uk/webteam/

Would you recommend the software you use to another institution? http://uni-software.ideascale.com/

csarven commented 11 years ago

Since a dbpedia resource is subject to use an uncontrolled property space (at least at this time to a large extent), it is difficult to rely on the outcome of a merge if an owl:sameAs is in place. This is probably true in any case, since whoever is making the owl:sameAs relation feels semantically strongly about it. We only need to look at sameas.org to see what kind of chaos comes out when people default to owl:sameAs.

I see skos:relatedMatch and rdfs:seeAlso more or less on the same level.

I prefer skos:(close|exact)Match for the reasons that's already mentioned here. Anyone that wishes to build a strong relation like owl:sameAs can go ahead with CONSTRUCT in any case. By that point, they are (hopefully) checking for additional patterns.

HughGlaser commented 11 years ago

As the person who republishes stuff at sameAs.org, perhaps I can help with some comments. Oh dear, it was going to be a couple of points, but has grown - never mind, here goes!

1) sameAs.org harvests from a lot of places and a lot of predicates - owl:sameAs, skos:closeMatch, skos:exactMatch, dbpedia:redirect for a start; although does not include not all the triples it finds, as there is some quality assessment before including things. It then (re-)publishes them all using owl:sameAs - this was a choice to make it easy for people to consume. Why owl:sameAs? Well, sameAs.org predates SKOS (or at least my knowledge of SKOS's existence), so there wasn't any other game in town. But in any case, consumers are free to rewrite the predicate to anything they like - in fact, I have often said that I would add an argument to the sameAs.org queries to say what predicate you want back, if anyone actually asked for it. A bit like JSONP. The bottom line here is that there are lots of people worrying about things, some few producers and almost no consumers! So making a service that quite possibly does what people want really simply was important. Why accept lots more predicates? Well, sameAs.org is a search engine, so the users would like the net cast quite widely, in case there is a triple involving something they want. For example, the geopolitical entity London is not owl:sameAs the geographical entity in many people's views. But if I am looking for facts about London, then I want to be able to see all of them, in case the person who published used the other URI. In this sense it is aimed at being low precision but high recall. I actually have other services that are much stricter, such as http://sameas.org/store/britishlibrary/ and http://sameas.org/store/freebase/ where the publishers want to make strong statements, rather than feed a search engine, and these have high precision with possibly correspondingly low recall.

2) If you want my vote, then no, don't create a new predicate, and don't use an obscure one. Use skos:closeMatch and/or skos:exactMatch which are widely-known and pretty much what is needed (for example the maintainer of sameAs,org would understand their use :-) ). And in fact, if anyone convinces me they will use it in anger, I could bring up stores with those names that only have those predicates in. rdfs:seeAlso is highly unlikely to be interpreted the way you want, I think. I have never found any such triple that was even useful enough for the liberal sameAs.org. Following an rdfs:seeAlso does not mean that you find anything about the URI over there, and the target URI is unlikely to even be anything like skos:closeMatch. The problem is that there are a lot of people who like generating this stuff, but as I said, barely anyone using it and giving feedback based on real problems - most of the issues people raise are based on what they think about how things should be modelled, with scarce experience of whether the modelling is "fit for purpose" in the engineering sense.

3) None of these relations should be in the default dbpedia.org resolution. When someone access a dbpedia.org/resource, they expect to get a reflection of what is in wikipedia - it is entirely wrong to also add some crowd-sourced triples to the same resource, and certainly it would be wrong to put them in the same graph in a store.

4) I live in a Linked Data world - this means that the whole system has to be able to work without RDF stores (just URI resolution). So solutions that rely on SPARQL queries and CONSTRUCT etc. are not solutions for me.

5) Sarven, perhaps you can you give me some examples of real systems that use sameAs.org that result in chaos.

Best Hugh Glaser

csarven commented 11 years ago

Hugh, great feedback! I'm in agreement with all your points. Actually, I didn't know that you could rewrite the predicates - cool!

The point that I was trying to make was not that the sameAs.org is causing problems, but that it is a good and simple way to demonstrate how relying on owl:sameAs may be counter-productive, especially if the consumers don't want to end up with an overwhelming list of statements that may be loosely related which they have to deal with, given that they all come from different sources and for different reasons. Having said that, I understand your point on precision:recall and I obviously can't point to any chaos based on that :)

HughGlaser commented 11 years ago

Thanks Sarven. Yes, I think the results from sameAs do look rather bizarre sometimes :-) Somehow closeMatch.org wouldn't be so catchy!

kurzum commented 11 years ago

Hi Hugh, 2) I am wondering what is your reason against having both? Having skos:closeMatch AND a domain specific property doesn't really hurt. I think, it helps to specify the relation closer, for those that can do something with it, e.g. librarians.

3) Nobody expects, what you say. DBpedia always had extra information from the beginning, e.g. links and also the DBpedia Ontology is not native to Wikipedia (also the mappings). So in fact there is nothing new, just a better way to collect links and get a denser LOD cloud.

HughGlaser commented 11 years ago

Hi Kurzum, 2) Good point. Yes, having multiple predicates actually helps consuming. I assume you mean something like I often put in all of dc:title, dcterms:title and rdfs:label for the same "fact" for a paper. So, say, skos:closeMatch and foo:bar (where perhaps foo:bar is an owl:subProperty of skos:closeMatch). Then consumers can choose either, but use your foo:bar if they know about it and want to align directly with your meaning. In fact, if you are publishing from a store, and have the reasoning capability, you can put the owl:subProperty in the store and the whole thing should work. But crucially the consumer won't need to know anything about it. I think that publishers too rarely put such "redundant" triples into data, moving the onus to the consumer to comply with their ontologies, or have some clever property/class relations along with a system that does inference.

3) Just because it has always been done doesn't make it right. (Right from the beginning I had to remove the links to flickwrapper because they were unreliable.) If you listen to what people say when they describe dbpedia you will never hear them say that there is a bunch of stuff from the crowd - it just isn't how they think of it. I wonder how easy it would be to describe the PROV of dbpedia with the crowd-sourced stuff in? If I take a few URIs from dbpedia and drop them in my store, thinking that all I have added is "facts" from wikipedia, all sorts of strange stuff of things getting combined (including erroneously) happens. On the other hand, people do expect the dbpedia ontology. And it is all about to get worse.

HughGlaser commented 11 years ago

On a different (sic) topic. Will there be anything for http://differentfrom.org ? It isn't a stupid as it sounds. In generating sameness information it sometimes turns out that things that at first or second look were the same turn out to be different. This is very valuable information - it certainly cost a lot of work to find. Such owl:differentFrom (SKOS has no similar predicate I think) stuff is therefore very useful, and needs to be stored for future use, and even published so others don't make the same mistake.

Essentially it is the system's regression test data. In fact that is where I get http://differentfrom.org/freebase/ from (Freebase/Google are kind enough to send me their regression test list to put there) and http://differentfrom.org/latc/ (I got this from the LATC web site). In fact, in finding dbpedia links you may well find http://differentfrom.org useful.

HughGlaser commented 11 years ago

Finally, if anyone wants a http://sameAs.org/store and/or a http://differentfrom.org store, please email me and I will create one (although I am travelling and only have intermittent connection at the moment).