ec-geolink / d1lod

DataONE Linked Open Data graph
Apache License 2.0
2 stars 0 forks source link

Address sanity check #41

Open amoeba opened 8 years ago

amoeba commented 8 years ago

Hi Bryce,

Here's the result of the sanity check for D1 graph. Btw, if you say yes for item #12, I will go ahead and make a small addition to the ontology.

  1. Non-literal value of rdfs:label ==> value of rdfs:label should be literal, so even though it's a URI, it needs to be given as literal, preferably explicitly typed.

For example, the following triple was caught by sanity check:

genid-1b5f3f88129d472fa8268082724d074a-node1710545 rdfs:label https://pasta.lternet.edu/package/metadata/eml/knb-lter-arc/20023/1

The correct one should have been: genid-1b5f3f88129d472fa8268082724d074a-node1710545 rdfs:label "https://pasta.lternet.edu/package/metadata/eml/knb-lter-arc/20023/1"^^http://www.w3.org/2001/XMLSchema#anyURI

or alternatively, if the xsd prefix is defined: genid-1b5f3f88129d472fa8268082724d074a-node1710545 rdfs:label "https://pasta.lternet.edu/package/metadata/eml/knb-lter-arc/20023/1"^^xsd:anyURI

  1. Found usage of glbase:hasIdentifierValue with non literal value ==> glbase:hasIdentifierValue property is a datatype property so it should point to a literal value of type xsd:string.

For example, the following triple was caught:

'genid-1b5f3f88129d472fa8268082724d074a-node1710545 http://schema.geolink.org/1.0/base/main#hasIdentifierValue https://pasta.lternet.edu/package/metadata/eml/knb-lter-arc/20023/1

The correct one should have been (assuming that xsd prefix is defined): 'genid-1b5f3f88129d472fa8268082724d074a-node1710545 http://schema.geolink.org/1.0/base/main#hasIdentifierValue "https://pasta.lternet.edu/package/metadata/eml/knb-lter-arc/20023/1"^^xsd:string

  1. Found usage of glbase:hasLandingPage with non literal value ==> glbase:hasLandingPage property is a datatype property so it should point to a literal value of type xsd:anyURI.

For example, the following triple was caught: http://lod.dataone.org/dataset/000440bc-c10b-46c7-aaa5-63794f1a9eac http://schema.geolink.org/1.0/base/main#hasLandingPage https://search.dataone.org/#view/000440bc-c10b-46c7-aaa5-63794f1a9eac

and the correct one should be:

http://lod.dataone.org/dataset/000440bc-c10b-46c7-aaa5-63794f1a9eac http://schema.geolink.org/1.0/base/main#hasLandingPage "https://search.dataone.org/#view/000440bc-c10b-46c7-aaa5-63794f1a9eac"^^xsd:anyURI

  1. Found triples that use the property http://schema.geolink.org/1.0/base/main#nameFull ==> should use http://schema.geolink.org/1.0/base/main#hasFullName property instead
  2. Found triples that use the property http://schema.geolink.org/1.0/base/main#nameGiven ==> should use http://schema.geolink.org/1.0/base/main#hasGivenName instead
  3. Found triples that use the property http://schema.geolink.org/1.0/base/main#nameFamily ==> Should use http://schema.geolink.org/1.0/base/main#hasFamilyName instead
  4. Found triples that use the property http://schema.geolink.org/1.0/base/main#namePrefix ==> Should use http://schema.geolink.org/1.0/base/main#hashasNamePrefix instead
  5. Found triples that use the property http://schema.geolink.org/1.0/base/main#description ==> should use http://schema.geolink.org/1.0/base/main#hasDescription instead
  6. Found triples that use Chinese text in the value of description property ==> I did not attempt to detect the language tag, so found them coincidentally. It's a good practice to put a language tag if you haven't put one :-)
  7. The property http://schema.geolink.org/1.0/base/main#hasIdentifierResolveURL is used ==> I thought it should be http://schema.geolink.org/1.0/base/main#hasIdentifierResolveURI ?
  8. Found triples that use the property http://schema.geolink.org/1.0/base/main#dateUploaded ==> Should use http://schema.geolink.org/1.0/base/main#hasUploadDate instead
  9. Found triples that use http://schema.geolink.org/1.0/base/main#address property. For example,

http://lod.dataone.org/organization/urn:uuid:0049e5cb-7be4-480b-8d15-84ee1251165f http://schema.geolink.org/1.0/base/main#address "Department of Biology MSC03 2020, 1 University of New Mexico Albuquerque NM 87131-1091 US"

==> We don't have any property named http://schema.geolink.org/1.0/base/main#address in the ontology and there is no property that can be used to point to organization's address. We can add one to the ontology, and if we do, I would suggest that we use <http://schema.geolink.org/1.0/base/main#hasPhysicalAddress with domain Organization and range xsd:string. Note that though I'm not sure if other data providers have organizations' physical address in their triples.

What do you think?

  1. With regards geometry information, I found the following stats: geolink:hasGeometryAsWktLiteral occurrences: 116257 geolink:Dataset instances: 160675 geolink:Dataset instances that uses geolink:hasGeometryAsWktLiteral: 116256 values for geolink:hasGeometryAsWktLiteral correctly typed: 0

Sample instances with geolink:hasGeometryAsWktLiteral (instance, literal, datatype): http://lod.dataone.org/dataset/000440bc-c10b-46c7-aaa5-63794f1a9eac POLYGON ((-105.84993 33.64792, -104.3995 33.64792, -104.3995 33.16255, -105.84993, 33.16255)) None

So, there are approx. 44 thousands instances of Dataset class that do not use hasGeometryAsWktLiteral property. I didn't check whether a wktLiteral is attached to them via some other property.

Regarding the correct literal typing, the literal should be given as, e.g.,

"POLYGON ((-105.84993 33.64792, -104.3995 33.64792, -104.3995 33.16255, -105.84993, 33.16255))"^^http://www.opengis.net/ont/geosparql#wktLiteral

But since Yingjie said that the map interface silently assumes that the value for hasGeometryAsWktLiteral property is always a wktLiteral, then my remark regarding literal typing above is really just a minor one.

amoeba commented 8 years ago

Specific fixes are:

amoeba commented 8 years ago