dbpedia / extraction-framework

The software used to extract structured data from Wikipedia
850 stars 270 forks source link

RML class expansion considered harmful #508

Open VladimirAlexiev opened 7 years ago

VladimirAlexiev commented 7 years ago

@wmaroy, @andimou, @kurzum, @jimkont This RML mapping https://github.com/dbpedia/extraction-framework/blob/rml/mappings/rml/en/Mapping_en:DavisCup_player.ttl produces the following classes out of a single one: dbo:TennisPlayer

        rr:class     dul:Agent ,
             owl:Thing ,
             dbo:Agent ,
             foaf:Person ,
             wikidata:Q10833314 ,
             schema:Person ,
             dbo:TennisPlayer ,
             dul:NaturalPerson ,
             dbo:Person ,
             dbo:Athlete ,
             wikidata:Q5 ,
             wikidata:Q215627 ;

I believe such superclass expansion is harmful and only the original class should be emitted. If someone wants the superclasses, they can use subClassOf reasoning simply enough. It's much harder to remove triples you don't want. Current email discussions indicate that people want the correspondences to other ontologies separate from the DBO ontology proper, which is in line with my old contentions: http://vladimiralexiev.github.io/pres/20150209-dbpedia/dbpedia-problems-long.html#sec-8.

kurzum commented 7 years ago

Hm, these classes must be new, I didn't see them before. Personally, I would even go so far to rather discuss to have no classes in Mappings at all. We could keep them in the Ontology directly, as they are perfectly owlable: dbo:TennisPlayer == dbp:wikiPageUsesTemplate hasValue (dbr:/Template:DavisCup_player) A reasoner would then assign classes based on the properties. This creates a separation of concerns.

wmaroy commented 7 years ago

These classes were automatically assigned. They were derived from the ontology file. Limiting the RML mappings on github with only one class assignment is indeed a better option. An option is also adding an additional reasoning step that adds these related classes afterwards to the RML mapping before executing them. I'll change it.

kurzum commented 7 years ago

note that my reasoning suggestion would not only add the inferred classes, but infer that the dbo class is added to the mapping in the first place. Maybe this decision can be taken later to decide where to keep the mapping to class assignment. If we modularize too much, we run into synchronisation problems. @jimkont Do you have an opinion about this?

andimou commented 7 years ago

@kurzum

if you do not want classes to be included in mapping rules that's totally possible but acting as devil's advocate here, if there are no classes in the mapping rules, then there will be no classes generated, then people can't query DBpedia based on classes. Is this something that you want?

If so, how would you resolve this? Would you put inferencing on the SPARQL endpoint side? That would lead to even more costly queries. Else?

VladimirAlexiev commented 7 years ago

@kurzum proposes class assignment using a simple global list template->class. But this would break conditional class assignment, eg see http://vladimiralexiev.github.io/pres/20150209-dbpedia/dbpedia-problems-long.html#sec-3-1 where a number of fields are checked to assign a class: "members", "former_members", "created", "background", "suffix" (and the check on "background" is not even for an exact value but "includes")

jimkont commented 7 years ago

Inmy opinion, the RML should contain class mappings but keep only the most direct one and we can decide how we generate the transitive ones. Keeping them all here does not help maintenance when, e.g. we change the class hierarchy. So, this one should keep only the dbo:TennisPlayer