URL string vs. URI values

ESIPFed / science-on-schema.org

science-on-schema.org - providing guidance for publishing schema.org as JSON-LD for the sciences

Apache License 2.0

109 stars 31 forks source link

URL string vs. URI values #170

Open smrgeoinfo opened 3 years ago

smrgeoinfo commented 3 years ago

does SOSO recommend that things like propertyID should be LOD URI's or just strings? Seems that would be a useful convention to have, one way or the other. In various cases Schema.org range is something like {Text | URL} we want the value to be treated as an IRI (URI). This impact harvesting, validation, display, or all three because the encoding is different e.g. string: "propertyID": "https://registry.identifiers.org/registry/doi", vs LOD URI: "propertyID": {"@id":"https://registry.identifiers.org/registry/doi"},

These generate different n3 rdf triples: <http://example.com/2462467> <http://schema.org/propertyID> "https://registry.identifiers.org/registry/doi" .

<http://example.com/2462467> <http://schema.org/propertyID> <https://registry.identifiers.org/registry/doi> .

smrgeoinfo commented 3 years ago

@danbri do you have any thoughts about this question?

danbri commented 3 years ago

My sense is thst if you’re at the level of using URIs you probably might as well just be mixing in first class non-schemaorg properties, without going through all this indirection

smrgeoinfo commented 2 years ago

I guess what @danbri is saying here is just use a text string for schema:propertyID?

mbjones commented 2 years ago

I think using the URI form has advantages. As you indicate, the triples that are generated differ, and the URI form results in a better graph format:

<http://example.com/2462467> <http://schema.org/propertyID> <https://registry.identifiers.org/registry/doi> .

At DataONE, we load the resulting graph into a graph engine and run semantic queries over it, so for us it is much better to not represent URIs as string literals so the graph can be traversed with SPARQL. I would be surprised, however, if this distinction is clear to most providers, so harvesters like us could also try to heuristically identify URI literals and preprocess them into real nodes rather than literals before querying. I suspect that process will be error-prone though.