Open VladimirAlexiev opened 8 years ago
What is the problem with the following?
maximumTemperature domain PhysicalObject
Planet subClassOf PhysicalObject
Lake subClassOf BodyOfWater
Sea subClassOf BodyOfWater
BodyOfWater subClassOf PhysicalObject
If you cannot say anything about the domain, you can still add a HasTemperature
class, but with a little thinking it is possible to find something meaningful.
@inf3rno The current hierarchy is BodyOfWater<NaturalPlace and Planet<CelestialBody. Of course you could introduce a superclass PhysicalObject or HasTemperature (this latter one is usually called a mixin). The problem is that the creators of the ontology have not found this useful until now.
So I would say that your proposals introduce useless abstract classes. "Useless" is not an overly strong term: schema.org has rejected the creation of "Agent", a super-class of Person and Organization, after a substantial discussion. I personally think such class is needed, but the community has disagreed.
Why should we accept useless abstract classes? Where do we stop, do we also introduce mixins like Nameable, Measurable, etc etc? Better to use polymorphic characteristics like schema:domainIncludes instead of monomorphic like rdfs:domain.
We really discussed a lot about how to structure the DBpedia Ontology. I think in the end it is super convenient to have it flat and simple. This makes it more maintainable and also has easier
props that help with data integration.
If you want to complicate it, we now have the option to make your own ontology on the Databus (can be fetched from Github or the OntologyURL) and then we can evolve and load it alongside. This gives you the freedom to make everything abstract without discussing it tediously. On the other hand, if people like your work they can use it. This is a much simpler process.
I am thinking about something called OntoFlow
(like GitFlow) or "OntoGrate" (Ontology + Integration). This is what we need. Non-blocking editing, not an ineffective ontology committee and discussion over discussion.
@VladimirAlexiev I think irrelevant is the proper word here. It is like programming, you add only the relevant part of reality to your model. Sometimes you even use estimation, simplification, etc. If you don't want to use inference just make something working, then using schema.org properties is better.
@inf3rno Right! You can't/shouldn't use RDFS inference with DBpedia. Eg from ?x dbo:parent ?y
you can't conclude anything about x and y
@VladimirAlexiev just because the hierachy is modest, doesn't mean it you can't do RDFS reasoning. Actually, a lot of people do RDFS inferencing, i.e.
Also OWL inference such as sameAs, equivalentClass is there and more sophisticated stuff will come soon. Albeit I would like to exploit SHACL more.
So why do you say this?
@VladimirAlexiev also a question. Do you know the correct semantic interpretation of:
dbo:maxTemp rdfs:domain dbo:BodyOfWater .
dbo:maxTemp rdfs:domain dbo:Planet .
This is interpreted as inferring both types, right?
Also we have type specific properties:
dbo:Planet/maxTemperature and dbo:BodyOfWater/maxTemperature
@kurzum yes, it should infer that the subject is both Planet and BodyOfWater, which is nonsense (except in the Waterworld movie).
Similarly for parent
, RDFS reasoning will infer many thing to be Persons
which they are not. Eg the second object from this infobox line
mother = [[Elizabeth]], queen of [[England]]
This is exactly why I've proposed this issue.
In dbpedia, domain and range are purely advisory because the extractor does not enforce them. (That was the case 3y ago, and is still true afaik). Which won't be easy to do, and would throw out triples we're not certain should be removed.
Type-specific props are a bad idea because
Further, there are no type-specific Object props, so they are not relevant to the discussion
In dbpedia, domain and range are purely advisory because the extractor does not enforce them. (That was the case 3y ago, and is still true afaik).
There has been a post-processing clean up step that is configured to remove such triples. It is easily extensible but is currently configured to remove triples when the object is of type that is owl:disjointWith the expected range. The same for the rdfs:domain
see mappingbased_objects_disjoint_domain*
and
mappingbased_objects_disjoint_range*
dumps
the post-processing is still in place. see e.g. https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects/2019.09.01. range for datatype properties are in fact steering the parsers during extraction and trigger unit conversion for the "specific properties" in this dedicated dataset.
In my opinion the domain / range of the properties in question is not defined well. it should be owl:thing or some really generic classes like MaterialThing for temparature. Moreover the mapping process should not accept the usage of properties like this or at least show warnings.
Well defined property domains and ranges in combination with the post processing (domain / range check) aim exactly at filtering out false triples like the example ?s dbo:mother :England. When using the filtered files for rdfs reasoning only it should work in the most cases. What would be the advances of using schema ranges / domains instead of rdfs w.r.t. reasoning and error filtering?
I didn't know about this post-processing. It's a good step, but not equivalent to enforcing rdfs:domain/range because it works based on explicit Disjoint declarations. This doesn't guarantee that no useful triples are thrown out, but may be a good compromise between having non-sensical triples and removing useful triples.
Good example: although http://dbpedia.org/ontology/firstAscentYear is defined only for dbo:Mountain, it will be preserved on dbo:Volcano because the two are not declared Disjoint (in fact both are subclasses of dbo:NaturalPlace).
What would be the advances of using schema ranges / domains instead of rdfs w.r.t. reasoning and error filtering?
There are 100-200 Volcanos for which dbo:firstAscentYear is known:
select * {
?x a dbo:Volcano; dbo:firstAscentYear ?y
}
If you apply RDFS reasoning, all will be inferred dbo:Mountain. That may be ok for some of them, but the ontology creators didn't think it appropriate to declare Volcano subClassOf Mountain, so that inference is not right.
At present there are only about 25 disjointness axioms. Most are about dbo:Person, and they are not rendered symmetric:
select * {
?x owl:disjointWith ?y
}
But even that may be too restrictive, eg these disjoints
dbo:Agent vs dbo:Place, dbo:Organization vs wgs:SpatialThing
don't account for the often occurring conflation of an organization and its (headquarters) building, which happens especially often for museums/libraries.
Some other ontology queries I played with:
Number of dbo:
props (note: there are over 55k raw dbp:
props, and many of them make no sense):
select (count(*) as ?c) {
?x a rdf:Property
filter(strstarts(str(?x),"http://dbpedia.org/ontology"))
}
2727
Breakdown into object vs data prop (there is a well-defined dichotomy):
select (count(*) as ?c) (sum(?obj) as ?object) (sum(?dat) as ?data) {
?x a rdf:Property
filter(strstarts(str(?x),"http://dbpedia.org/ontology"))
bind(exists {?x a owl:ObjectProperty} as ?obj)
bind(exists {?x a owl:DatatypeProperty} as ?dat)
}
object 1105, data 1622
Props with defined range:
select (count(*) as ?c) {
?x a rdf:Property; rdfs:range ?range
filter(strstarts(str(?x),"http://dbpedia.org/ontology"))
}
2450
Thus 277 props have no defined range. Not all of them are dataProps, eg http://dbpedia.org/ontology/subClassis is object prop:
select * {
?x a rdf:Property
filter(strstarts(str(?x),"http://dbpedia.org/ontology"))
filter not exists {?x rdfs:range ?range}
}
Breakdown by data/obj, then range:
select (count(*) as ?c) (sum(?obj) as ?object) (sum(?dat) as ?data) {
?x a rdf:Property
filter(strstarts(str(?x),"http://dbpedia.org/ontology"))
bind(exists {?x a owl:ObjectProperty} as ?obj)
bind(exists {?x a owl:DatatypeProperty} as ?dat)
optional {?x rdfs:range ?range}
} group by ?range order by desc(?dat), ?range
If you examine dbo:firstAscentPerson, you'll see plenty of nok:
This need is also borne out by domain/range validation: http://mappings.dbpedia.org/validation/index.html E.g. filter to lang="en", predicate="temp": This shows that maximumTemperature is only defined for Planet but is also used for Lake, Sea, etc. Tracing the class hierarchy doesn't show a useful super-class of these, so we need two classes: Planet, BodyOfWater
Therefore I would suggest to allow several domains & ranges in the ontology definition.