Closed VladimirAlexiev closed 9 years ago
the ontology snapshot for 2015-04 release does not contain this relation http://dbpedia.org/ontology/CelestialBody so for the static release it is fine.
For DBpedia Live the rolling of the update should be finished in a few days / weeks. Subclass relations are hard to feed into live so we rely on the unmodified feeder (checking articles not extracted for iirc 60 days) to complete the change
If you run the last query on http://dbpedia.org/sparql, it's worse: over 1000 places, including River, Stream, BodyOfWater, Country, PopulatedPlace, Settlement.
The latter two come from the heuristic typing method SDType, which uses ingoing properties to type instances. Both examples mentioned have ingoing relations of type almaMater, which tell the typing algorithm that they should be universities.
@HeikoPaulheim Then Heuristic Typing (partilly?) implements rdfs:range, which is detrimental. Why: because wikipedia editors put all kind of shit in template fields, so most DBO ranges are wishful thinking. Before adding, Heuristic Typing must check whether the target already has some types, and assume that sibling types are disjoint.
If you don't do this, you'll wreak havoc by inferring that various countries are persons and vice versa.
You'll also infer that these are people:
Archbishop, Corfu, All My Children, Adoption, Kajang, Prehistory
See http://vladimiralexiev.github.io/pres/20150209-dbpedia/dbpedia-problems.html#/sec-7-3.
@VladimirAlexiev SDType learns type distributions as they are actually deployed in Dbpediy, not as they are defined in as rdfs:range in the ontology. In the almaMater example I gave, university is assumed because the vast majority of objects of that property have that type, not because it's defined in the ontology.
Furthermore, on DBpedia, we only apply it to instances that do not have any types before, so no inconsistencies are introduced for those entities.
@HeikoPaulheim
@VladimirAlexiev re 1: it has been shown empirically that this approach, combined with some post processing, gives reasonable results for many use cases. The approach deployed for DBpedia is configured for achieving 95% precision. re 2: the instance at hand actually did not have types before, see [1]. re 3: we apply a confidence threshold, but no consistency checking of the solution is used.
[1] http://downloads.dbpedia.org/preview.php?file=2015-04_sl_core-i18n_sl_en_sl_instance-types_en.nt.bz2
re 2. It uses "Football club infobox" which is mapped http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_football_club, so it does have types. See http://mappings.dbpedia.org/server/extraction/en/extract?title=FC_Minsk&format=turtle-triples. The URL you gave is a very small sampling.
Sorry, it was meant to be [1]. Still, in that file, there's no types for the instance, which is why SDType attempts to type it.
[1] http://downloads.dbpedia.org/2015-04/core-i18n/en/instance-types_en.nt.bz2
re 2: Right, instance-types_en.ttl doesn't define any types for FC_Minsk. (It has however a type for an IintermediateNode related to it:
<http://dbpedia.org/resource/FC_Minsk-2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/SoccerClub>
@jimkont: this is strange because even the oldid page version that current dbpedia.org data was extracted from has the Football club infobox
template. That template has mapped to the appropriate class practically forever
SDTyped defines these for FC_Minsk:
<http://dbpedia.org/ontology/Organisation> .
<http://dbpedia.org/ontology/SoccerClub> .
<http://www.wikidata.org/entity/Q486972> .
<http://www.w3.org/2002/07/owl#Thing> .
<http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Agent> .
<http://dbpedia.org/ontology/Agent> .
<http://www.wikidata.org/entity/Q43229> .
<http://dbpedia.org/ontology/PopulatedPlace> .
<http://schema.org/SportsTeam> .
<http://dbpedia.org/ontology/SportsTeam> .
<http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#SocialPerson> .
<http://schema.org/Organization> .
re 3. It would be really nice to implement some disjointness.
re 1. I evaluated the first 11 items in SDTyped
1 wrong, 3 somewhat incomplete, 8 perfect. Cheers!
Querying http://live.dbpedia.org/sparql.