SuLab / GeneWikiCentral

GeneWiki Organization
MIT License
5 stars 2 forks source link

figure out which "mapping type" data model is more "correct" #120

Open andrewsu opened 4 years ago

andrewsu commented 4 years ago

I just noticed that there is a property for Exact match https://www.wikidata.org/wiki/Property:P2888

Use of that property appears to be redundant with the item for Exact match https://www.wikidata.org/wiki/Q39893449 which we use with the Mapping type property https://www.wikidata.org/wiki/Property:P4390. My guess is that both methods are trying to express the same thing (and both have corresponding properties/items for broad/narrow matches).

I'm putting this here as a placeholder for us to figure out which is more commonly used, maybe bring it up in a talk page for more discussion...

andrawaag commented 4 years ago

They serve different purposes. Mapping relations are to map to external identifiers not necessarily linked data resources, whereas the exact match property is to point to related URIs. I proposed the P2888 some time ago to enable federated queries. Using the mapping relation qualifier can be used in federated queries, but leads to quite some complex queries (with possible time outs), since the way qualifiers are modelled in Wikidata RDF is not straight forward and it is not directly clear if the mapped identifier points to linked resources.

andrewsu commented 4 years ago

so just to be sure I understand, let's look at the item for Pick disease https://www.wikidata.org/wiki/Q18576. I see two statements:

Both of these statements express the same thing, but one is easier to use in federated queries. Do I have that right?

If so, this is a weird example on that same item, right?

andrawaag commented 4 years ago

Yes that is correct. Actually DOID:11870 is a nice example. The URI contains the DOID11870, and translating that from DOID: to DOID is not easy. At least not using the URL formatter property in Wikidata. Things get more complicated if the identifier is not part of the URI. Some resources use hash or UUID to do that. In those cases, it is not possible to infer a URI from the given local identifier. The main thing is that by using exact match (P2888) we can leverage downstream nodes in that representation, whereas with the local identifier one has to parse the HTML page, because that is usually where Resolver URL points to.

I agree with you that your last example is a weird one and probably incorrect. It would help to propose broad and narrow match as properties to Wikidata, but having the mapping relation as property with the option. Will be a tough one because not everyone appreciate the federated query use case and will probably object. Having said that going forward it might be worth the effort.