dbpedia / mappings-tracker

This project is used for tracking mapping issues in mappings.dbpedia.org
9 stars 6 forks source link

dbpo:type #55

Open roland-c opened 9 years ago

roland-c commented 9 years ago

In the mapping of infobox Organisation was this mapping [1] for template property ' type' and ontology property ' type' . Ontology property type (dbpo:type) [2] is equivalent to wikidata:p31, which is equivalent to rdf:type ... therefor dbpo:type is equivalent to rdf:type. I think dbpo:type should be removed from the ontology if it is equivalent to rdf:type.

Roland

[1] http://mappings.dbpedia.org/index.php?title=Mapping_nl:Infobox_organisatie&oldid=32735 [2] http://mappings.dbpedia.org/index.php/OntologyProperty:Type http://dbpedia.org/ontology/type

VladimirAlexiev commented 9 years ago

dbo:type is not rdf:type, as you can see by searching for usages "ontologyproperty type".

Eg https://en.wikipedia.org/wiki/Dumbarton_Bridge_(California) has 3 design->type values: dbr:Twin, dbr:Concrete, dbr:Girder_Bridge. No sane person will call these classes:

P31 "instance of" suffered similar problems but they've been cleaned up somewhat. Eg all 2.4M people are now "Human" and not firefighters, inventors or whatnot. But they still got 16k classes, of which 2/3 not even 5 instances, so need to be cleaned up.

Related question: why do we also have dct:type? Imho dbo:type and dct:type mean the same.

@jimkont: We should remove P31 from this prop. This shows yet again that external mappings are dangerous unless the mapper explores and understands the meaning.

jimkont commented 9 years ago

@jimkont https://github.com/jimkont: We should remove P31 from this prop. This shows yet again that external mappings are dangerous unless the mapper explores and understands the meaning.

I agree to remove it but keep a comment pointing to this thread in case someone tries that again

I will put this in the agenda of the next dev telco, best ways to store wikidata mappings without ontology implications

roland-c commented 9 years ago

I see the problem of the usage of dbpo:type in these mappings; it shows the need to further specify the "type" of an object described in an infobox. What you see there is that the type (Class) the infobox is mapped to needs additional, detailed clasification as is specified within the infobox. This detailed classification should relate semantically as a subclass to the class the infobox is mapped to (the rdf:type). If it doesn't, like in the example of design <> type [2], a different property should be used expressing the detailed semantics of the relation.

imho we also have dct:type because Dublin Core Terms is at first developed without much consideration for RDF. DCMI states bout the coexistence of rdf:type and dcterms:type the following: "It is recommended that RDF applications use explicit rdf:type triples, even if that means creating a separate DCAM description of the value.

The property dcterms:type has semantics very similar to rdf:type. At the time of writing, the precise relationship between those properties remains undecided. It is recommended that RDF applications implementing this specification primarily use and understand rdf:type in place of dcterms:type when expressing Dublin Core metadata in RDF, as most RDF processors come with built-in knowledge of rdf:type." [2]

I think it will be much,more clear to have only rdf:type as the main property for classifying an infobox. If there is further classification within the infobox it must be a subproperty of rdf:type. Anything else should have another property mapped, and not dbpo:type.

[1] https://en.wikipedia.org/wiki/Dumbarton_Bridge_(California) [2] http://dublincore.org/documents/dc-rdf/

VladimirAlexiev commented 9 years ago

@roland-c

dcterms:type has semantics very similar to rdf:type

rdf:type has actionable RDFS and OWL semantics that dct:type does not.

You seem to ascribe special meaning to dbo:type that just isn't there. dbo:type, dct:type, schema:additionalType: they are all used to attach a "business" or "application" type to a resource, with no strict meaning of this extra type, and no expectation of set inclusion.

What you see there is that the type (Class) the infobox is mapped to needs additional, detailed clasification as is specified within the infobox it will be much,more clear to have only rdf:type as the main property for classifying an infobox. If there is further classification within the infobox it must be a subproperty of rdf:type.

I don't think we should use rdf:type (or a subprop thereof) to point to something that is uncontrolled. rdf:type should be used only with classes in the DBO: those are not tightly controlled but at least are visible in the class hierarchy http://mappings.dbpedia.org/server/ontology/classes/ and some bright minds are maybe looking at them critically.

If rdf:type is used with random dbpedia resources, this will lead to the same disaster as currently in Wikidata: 16k classes of which 2/3 don't even have 5 instances. My observation above bears this out: neither Twins nor Concrete are Bridges. GirderBridges are Bridges, so we can consider this as a subclass, but I'd rather have someone add it explicitly (after checking how many instances) than an uncontrolled DBResource. (That the wikipedia field here is called "design" not "type" has little bearing imho).

Can you give a useful example of a set of wikipedia resources that have a strict classification field, should be represented as a DBO class, but aren't yet?

VladimirAlexiev commented 8 years ago

Removed "equivalentProperty wd:P31".

VladimirAlexiev commented 8 years ago

Dct:type deletion log: "16:13, 30 January 2015 Mgns restored "OntologyProperty:Dct:type" ‎ (used in FileTypeExtractor code)".

I looked at the source: FileTypeExtractor is inconsistent as to whether it uses dc:type or dct:type:

fileTypeClass is set in FileTypeExtractorConfig.scala#L23 : ontology.classes(result._1) It seems to me this returns dbo:StillImage etc, not dct:StillImage as claimed above.

Because the type is URL not literal, we should use dbo:type or dct:type not dc:type. I don't see any difference between dbo:type and dct:type and because the former is used in >500 mappings I think we should replace dct:type with dbo:type in FileTypeExtractor.scala (and should make the prop name and comments there consistent!)

@jimkont & @guarav, do you agree?

jimkont commented 8 years ago

the idea to use dct was to make the data use a more widely accepted vocab. Using dbo will miss that point so the in this case we could drop the triples completely. regarding dc/dct we use dct->dcterms http://commons.dbpedia.org/page/File:DBpediaLogo.svg

VladimirAlexiev commented 8 years ago
jimkont commented 8 years ago

I fixed the comments to make it clear. I do not remember if it was dct:StillImage or dbo:StillImage, either a misleading comment or a bug in the code. (We emit dbo:StillImage). @gaurav, it's along time ago but do you remember which one was it?

regarding dct:type vs dbo:type, I am open whatever we choose

VladimirAlexiev commented 8 years ago

I guessed it emits dbo:StillImage. Nothing really wrong with this class, except that dctype:StillImage precede it by decades. (note it's dctype: not dct:: see http://prefix.cc/dctype). So if we prefer dct:type for this particular case, we should be consistent and prefer dctype:StillImage.

VladimirAlexiev commented 8 years ago

dctype: aka DCMI Type Vocabulary includes classes: Collection , Dataset , Event , Image , InteractiveResource , MovingImage , PhysicalObject , Service , Software , Sound , StillImage , Text

gaurav commented 8 years ago

@jimkont It was definitely dbo:StillImage, which we decided would be identical to dcmitype:StillImage. http://mappings.dbpedia.org/index.php?title=OntologyClass:StillImage&diff=35472&oldid=35434

VladimirAlexiev commented 8 years ago

You know that you can use dctype:StillImage in the mapping wiki, right? Eg see http://mappings.dbpedia.org/index.php/OntologyProperty:Dc:type What's the purpose of making dbo terms exactly the same as other terms that precede them by decades? equivalentClass semantics says both types must be applied on every item that has one of them, this is wasteful. And btw only the comment says it's equivalent, the equivalentClass field is not filled out.

To summarize my position: the fewer classes and properties exist in the world, the better.

OPTIONAL: