DM2E / dm2e-mappings

0 stars 0 forks source link

Proposal: Check if if resource has rdf:type #87

Open edroege opened 10 years ago

edroege commented 10 years ago

The validator should check if individuals (like http://data.dm2e.eu/data/item/mpiwg/harriot/MPIWG_0HE26A22_00108 ) have an rdf:type (like e.g. http://www.europeana.eu/schemas/edm/ProvidedCHO ).

This should be checked for all classes: CHO and Aggregation but also for contextual classes like Agents, Places, Timespans etc.

If no class is indicated, give a warning.

ksdm2e commented 10 years ago

how is it possible to have no rdf:type? Could you give an example?

kba commented 10 years ago
<http://agg1> edm:aggregatedCHO <http://cho1>

implies that

without making it explicit. What @edroege proposes means that the graph should contain those statements:

<http://agg1> rdf:type ore:Aggregation .
<http://cho1> rdf:type edm:ProvidedCHO .
<http://agg1> edm:aggregatedCHO <http://cho1> .

Many providers provide data such as:

<http://agg1> edm:isShownBy <http://webresource> .

without asserting anything about http://webresource or omitting the fact that it is an edm:WebResource.

edroege commented 10 years ago

@kba: Thanks for clarifying!

I think it has to be explicit. In case of Web resources, Aggregations and CHOs we could get this information afterwards because shownBy has only a web resource as a range etc. But in cases where the range is broader, like edm:Agent (with subclasses foaf:Person and foaf:Organization), we cannot get the information if the class edm:Agent, foaf:Person or foaf:Organization was meant.

d0rg0ld commented 10 years ago

@ksdm2e

<rdf:Description rdf:about="http://en.wikipedia.org/wiki/Oxford">
        <dc:title>Oxford</dc:title>
        <dc:coverage>Oxfordshire</dc:coverage>
        <dc:publisher>Wikipedia</dc:publisher>
        <region:population>10000</region:population>
        <region:principaltown rdf:resource="http://www.country-regions.fake/oxford"/>
    </rdf:Description>

Valid RDF -> You have no idea what type "http://en.wikipedia.org/wiki/Oxford" has

ksdm2e commented 10 years ago

Unfortunately, I don't get it. It seems to me to be syntactical issue. When I look at the turtle representation, rdf/xml or even the html-representation of the above mentioned sample, there is always a clear declaration of the class, e.g.:

<http://data.dm2e.eu/data/item/bbaw/dta/16157>
      a       edm:ProvidedCHO ; ...

@d0rg0ld @edroege Where do you find such "orphaned", untyped triples?

kba commented 10 years ago

@ksdm2e I don't think that this issue is pertinent to DTA data, http://data.dm2e.eu/data/item/bbaw/dta/16157 is indeed not relevant for this issue. I'll change the URL to a fitting example. You are using the variant of RDF/XML that enforces an rdf:type statement by syntax. You might be susceptible to referring to things without asserting anything about them, though.

One place I do notice this issue is in MPIWG/Harriot: http://data.dm2e.eu/data/rdf/resourcemap/mpiwg/harriot/MPIWG_0HE26A22/20140306195409535?output=ttl. Resolving one random page: http://data.dm2e.eu/data/rdf/resourcemap/mpiwg/harriot/MPIWG_0HE26A22_00104/20140306195409535?output=ttl There are no rdf:type statements in there.

d0rg0ld commented 10 years ago

@kba yeah it's because they use the xml <rdf:Description rdf:about=way of representing rdf/xml

kba commented 10 years ago

@d0rg0ld Actually, MPIWG delivers N-TRIPLE but the pitfalls are the same.

d0rg0ld commented 10 years ago

@kba ok I was referring to an old sample I got months ago ...

edroege commented 10 years ago

@ksdm2e I just took some example URIs to make my point clearer. Your data is fine - sorry if you have thought that there is something wrong. I was suggesting to add a warning or something to the validator, not to correct ingestions.

We will make it clearer in the next revision of the model specification that the mappings should contain classes. Can the validator check during the ingestion if e.g. the class edm:ProvidedCHO occurs as often as the property edm:aggregatedCHO? Or is this too complicated?

kba commented 10 years ago

I will implement the wanted behavior in the validator, check that every object in '?s edm:aggregatedCHO ?object' triple is an (has rdf:type) edm:ProvidedCHO and every object in triple with a WebResource-related predicate (edm:hasView, edm:shownAt, edm:shownBy ...) is a (has rdf:type) edm:WebResource. Will notify once deployed.

kba commented 10 years ago

OK, the validator will now emit a WARNING for every subject in a file that has no 'rdf:type' statements. Deployed since build 'Mon Apr 7 23:12:14 CEST 2014', please re-download.