dbpedia / ontology-tracker

Here we keep track of modification requests in the DBpedia Ontology
35 stars 11 forks source link

switch from rdfs:domain/range to schema:domainIncludes/rangeIncludes #14

Open VladimirAlexiev opened 8 years ago

VladimirAlexiev commented 8 years ago
  1. The Ontology Survey shows that people want Union classes.

This need is also borne out by domain/range validation: http://mappings.dbpedia.org/validation/index.html E.g. filter to lang="en", predicate="temp": This shows that maximumTemperature is only defined for Planet but is also used for Lake, Sea, etc. Tracing the class hierarchy doesn't show a useful super-class of these, so we need two classes: Planet, BodyOfWater

Therefore I would suggest to allow several domains & ranges in the ontology definition.

  1. But using multiple rdfs:domain or rdfs:range for this purpose would be a disaster. RDFS semantics would then infer that any resource with maximumTemperature is BOTH a Planet and a BodyOfWater.
  2. The current single rdfs:domain and rdfs:range are wishful thinking. If they were correct, we should delete all statements appearing in domain/range validation, but that's obviously wrong. So switching to schema:domainIncludes and schema:rangeIncludes will reflect more accurately the meaning in the ontology wiki.
VladimirAlexiev commented 8 years ago

Discussion in https://docs.google.com/document/d/1pQPO61d3RJY05yHSxlcu4DsR1NEcW8n9URoTci4lFJY/edit#

ghost commented 4 years ago

What is the problem with the following?

maximumTemperature domain PhysicalObject
Planet subClassOf PhysicalObject
Lake subClassOf BodyOfWater
Sea subClassOf BodyOfWater
BodyOfWater subClassOf PhysicalObject

If you cannot say anything about the domain, you can still add a HasTemperature class, but with a little thinking it is possible to find something meaningful.

VladimirAlexiev commented 4 years ago

@inf3rno The current hierarchy is BodyOfWater<NaturalPlace and Planet<CelestialBody. Of course you could introduce a superclass PhysicalObject or HasTemperature (this latter one is usually called a mixin). The problem is that the creators of the ontology have not found this useful until now.

So I would say that your proposals introduce useless abstract classes. "Useless" is not an overly strong term: schema.org has rejected the creation of "Agent", a super-class of Person and Organization, after a substantial discussion. I personally think such class is needed, but the community has disagreed.

Why should we accept useless abstract classes? Where do we stop, do we also introduce mixins like Nameable, Measurable, etc etc? Better to use polymorphic characteristics like schema:domainIncludes instead of monomorphic like rdfs:domain.

kurzum commented 4 years ago

We really discussed a lot about how to structure the DBpedia Ontology. I think in the end it is super convenient to have it flat and simple. This makes it more maintainable and also has easier props that help with data integration. If you want to complicate it, we now have the option to make your own ontology on the Databus (can be fetched from Github or the OntologyURL) and then we can evolve and load it alongside. This gives you the freedom to make everything abstract without discussing it tediously. On the other hand, if people like your work they can use it. This is a much simpler process.

I am thinking about something called OntoFlow (like GitFlow) or "OntoGrate" (Ontology + Integration). This is what we need. Non-blocking editing, not an ineffective ontology committee and discussion over discussion.

ghost commented 4 years ago

@VladimirAlexiev I think irrelevant is the proper word here. It is like programming, you add only the relevant part of reality to your model. Sometimes you even use estimation, simplification, etc. If you don't want to use inference just make something working, then using schema.org properties is better.

VladimirAlexiev commented 4 years ago

@inf3rno Right! You can't/shouldn't use RDFS inference with DBpedia. Eg from ?x dbo:parent ?y you can't conclude anything about x and y

kurzum commented 4 years ago

@VladimirAlexiev just because the hierachy is modest, doesn't mean it you can't do RDFS reasoning. Actually, a lot of people do RDFS inferencing, i.e.

Also OWL inference such as sameAs, equivalentClass is there and more sophisticated stuff will come soon. Albeit I would like to exploit SHACL more.

So why do you say this?

kurzum commented 4 years ago

@VladimirAlexiev also a question. Do you know the correct semantic interpretation of:

dbo:maxTemp rdfs:domain dbo:BodyOfWater .
dbo:maxTemp rdfs:domain dbo:Planet .

This is interpreted as inferring both types, right?

Also we have type specific properties:

dbo:Planet/maxTemperature and dbo:BodyOfWater/maxTemperature
VladimirAlexiev commented 4 years ago

@kurzum yes, it should infer that the subject is both Planet and BodyOfWater, which is nonsense (except in the Waterworld movie).

Similarly for parent, RDFS reasoning will infer many thing to be Persons which they are not. Eg the second object from this infobox line

  mother = [[Elizabeth]], queen of [[England]]

This is exactly why I've proposed this issue.

In dbpedia, domain and range are purely advisory because the extractor does not enforce them. (That was the case 3y ago, and is still true afaik). Which won't be easy to do, and would throw out triples we're not certain should be removed.

VladimirAlexiev commented 4 years ago

Type-specific props are a bad idea because

Further, there are no type-specific Object props, so they are not relevant to the discussion

jimkont commented 4 years ago

In dbpedia, domain and range are purely advisory because the extractor does not enforce them. (That was the case 3y ago, and is still true afaik).

There has been a post-processing clean up step that is configured to remove such triples. It is easily extensible but is currently configured to remove triples when the object is of type that is owl:disjointWith the expected range. The same for the rdfs:domain

see mappingbased_objects_disjoint_domain* and mappingbased_objects_disjoint_range*dumps

JJ-Author commented 4 years ago

the post-processing is still in place. see e.g. https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects/2019.09.01. range for datatype properties are in fact steering the parsers during extraction and trigger unit conversion for the "specific properties" in this dedicated dataset.

In my opinion the domain / range of the properties in question is not defined well. it should be owl:thing or some really generic classes like MaterialThing for temparature. Moreover the mapping process should not accept the usage of properties like this or at least show warnings.

Well defined property domains and ranges in combination with the post processing (domain / range check) aim exactly at filtering out false triples like the example ?s dbo:mother :England. When using the filtered files for rdfs reasoning only it should work in the most cases. What would be the advances of using schema ranges / domains instead of rdfs w.r.t. reasoning and error filtering?

VladimirAlexiev commented 4 years ago

I didn't know about this post-processing. It's a good step, but not equivalent to enforcing rdfs:domain/range because it works based on explicit Disjoint declarations. This doesn't guarantee that no useful triples are thrown out, but may be a good compromise between having non-sensical triples and removing useful triples.

Good example: although http://dbpedia.org/ontology/firstAscentYear is defined only for dbo:Mountain, it will be preserved on dbo:Volcano because the two are not declared Disjoint (in fact both are subclasses of dbo:NaturalPlace).

What would be the advances of using schema ranges / domains instead of rdfs w.r.t. reasoning and error filtering?

There are 100-200 Volcanos for which dbo:firstAscentYear is known:

select * {
  ?x a dbo:Volcano; dbo:firstAscentYear ?y
}

If you apply RDFS reasoning, all will be inferred dbo:Mountain. That may be ok for some of them, but the ontology creators didn't think it appropriate to declare Volcano subClassOf Mountain, so that inference is not right.

At present there are only about 25 disjointness axioms. Most are about dbo:Person, and they are not rendered symmetric:

select * {
  ?x owl:disjointWith ?y
}

But even that may be too restrictive, eg these disjoints

dbo:Agent vs dbo:Place, dbo:Organization vs wgs:SpatialThing

don't account for the often occurring conflation of an organization and its (headquarters) building, which happens especially often for museums/libraries.


Some other ontology queries I played with:

Number of dbo: props (note: there are over 55k raw dbp: props, and many of them make no sense):

select (count(*) as ?c) {
  ?x a rdf:Property
  filter(strstarts(str(?x),"http://dbpedia.org/ontology"))
} 
2727

Breakdown into object vs data prop (there is a well-defined dichotomy):

select (count(*) as ?c) (sum(?obj) as ?object) (sum(?dat) as ?data) {
  ?x a rdf:Property
  filter(strstarts(str(?x),"http://dbpedia.org/ontology"))
  bind(exists {?x a owl:ObjectProperty} as ?obj)
  bind(exists {?x a owl:DatatypeProperty} as ?dat)
} 
object 1105, data 1622

Props with defined range:

select (count(*) as ?c) {
  ?x a rdf:Property; rdfs:range ?range
  filter(strstarts(str(?x),"http://dbpedia.org/ontology"))
} 
2450

Thus 277 props have no defined range. Not all of them are dataProps, eg http://dbpedia.org/ontology/subClassis is object prop:

select * {
  ?x a rdf:Property
  filter(strstarts(str(?x),"http://dbpedia.org/ontology"))
  filter not exists {?x rdfs:range ?range}
}

Breakdown by data/obj, then range:

select (count(*) as ?c) (sum(?obj) as ?object) (sum(?dat) as ?data) {
  ?x a rdf:Property
  filter(strstarts(str(?x),"http://dbpedia.org/ontology"))
  bind(exists {?x a owl:ObjectProperty} as ?obj)
  bind(exists {?x a owl:DatatypeProperty} as ?dat)
  optional {?x rdfs:range ?range}
} group by ?range order by desc(?dat), ?range
VladimirAlexiev commented 4 years ago

If you examine dbo:firstAscentPerson, you'll see plenty of nok: