DataONEorg / hashstore

HashStore, a hash-based object store for DataONE data packages
Apache License 2.0
1 stars 1 forks source link

Annotation Design: N-Triple vs JSON-LD Discussion #60

Closed doulikecookiedough closed 11 months ago

doulikecookiedough commented 1 year ago

Discuss and determine what format RDF annotation files should be written in for HashStore annotations.

doulikecookiedough commented 1 year ago

The goal of storing annotations in HashStore is to facilitate a paging API which will allow front-end services/MetacatUI to efficiently display and parse sections of a large dataset. Each annotation file contains statements that represent a subject's relationships with relevant objects, and must be stored in a RDF serialization format. For HashStore, we will consider two formats for storing these files: N-Triple and JSON-LD

Initial Thoughts

Resources and relevant links of interest to learn more:

mbjones commented 1 year ago

Nice overview, at @doulikecookiedough .

For RDF representations, the other format we've used and that I find both readable and efficient is Turtle. N-Triples is a simnplified version of Turtle that removes most of what I like about the compact Turtle representation. Here's an example ontology we developed and use in Turtle format called ADCAD: https://github.com/NCEAS/adc-disciplines/blob/main/ADCAD.ttl I generally find Turtle syntax to be a lot more readable than N-triples, and all common RDF libraries know how to use it (it preceded N-triples). Here's an example:

odo:ADCAD_00040
    a owl:Class ;
    rdfs:label "Oceanography" ;
    rdfs:subClassOf odo:ADCAD_00038 ;
    skos:exactMatch <http://localhost/plosthes.2017-1#6456>, wikidata:Q43518 .

That said, JSON-LD will be even more readable for most folks.

In terms of use cases, I think there is some chance that client tools will need to be able to compose and submit these annotation files, which might be more readily done in JSON-LD as a convenient syntax. Processing will almost always involve conversion to an RDF model and use of a query language such as SPARQL, which is what our current indexer currently does with both JSON-LD and RDF/XML. As we expect each annotation file to be relatively small (a few to a dozen triples), the efficiency argument isn't such a big issue. But it would be nice for various tools to be able to easily create, serialize, and deserialize the files. JSON-LD has some nice lightweight libraries in most languages for this.