Annotation Design: N-Triple vs JSON-LD Discussion

The goal of storing annotations in HashStore is to facilitate a paging API which will allow front-end services/MetacatUI to efficiently display and parse sections of a large dataset. Each annotation file contains statements that represent a subject's relationships with relevant objects, and must be stored in a RDF serialization format. For HashStore, we will consider two formats for storing these files: N-Triple and JSON-LD

N-Triples Summary:
- Simple, line-based and flat format making it easier to parse and smaller in size.
- Designed with less with human readability in mind and relies more on external documentation or additional statements to establish context.
- Well understood and supported in the RDF/semantic web community (all triple stores should be able to work with and support N-Triples)
JSON-LD Summary:
- "Modern" format with more human readable/user-friendly structured and hierarchical syntax.
- It would integrate well with web applications or services that already use JSON.
- Feature-rich out of the box, like the ability to work with contextual information where you can directly map property names to their resource identifiers and datatypes so literals (integers, dates, etc.) can be expressed explicitly, instead of plain strings.

Initial Thoughts

Ease of Use: I am assuming that the front-end/MetacatUI will be interacting with the paging API to retrieve information about a dataset, and not our annotation graph directly. This means that while it would be nice if our annotation file format was more human readable, it is not a core requirement. A large factor in why I would potentially suggest choosing JSON-LD is because of its human friendliness.
Community Support: Both N-Triples and JSON-LD are widely used, supported and appear to have active communities, so I would compare them roughly equal here. I would give the edge to N-Triples because it is a widely accepted RDF serialization format that is natively supported by most, if not all RDF triplestores. We would have many options to choose from.
Semantic Interoperability: While it will require a bit more effort to set-up contextual mapping with N-Triples (compared to JSON-LD's built-in contextual mapping feature), it's not impossible if it is a requirement. Both formats would require a time commitment and/or a set of guidelines to follow when managing the contexts, internal or external. If we were to use SPARQL to query our RDF data, both formats would be acceptable.
Scalability: As large datasets grow, using JSON-LD will require more space to store. However, if data becomes increasingly complex/contextually rich, it may be worth it. N-Triples are a better choice if our concerns mainly revolve around storage efficiency and processing. A dataset that grows exponentially in size feels like less of an issue when using N-Triples.
Performance: In order to compare performance and to definitively say N-Triple is more performant, additional understanding is required regarding the technologies involved in our paging API. However, it feels like every resource I've come across has expressed/emphasized in some way that N-Triples are the best when it comes to performance.
What format we should use? Do the benefits of JSON-LD's built-in features (contextual mapping, data typing, nesting and structured/hierarchical syntax) present a strong enough case to outweigh the performant nature and standardized/established format of N-Triples?
- Still not sure... Discussion to be continued after determining HashStore annotation file requirements and features.

Resources and relevant links of interest to learn more:

RDF Advantages and Myths
Summary #1 of RDF formats
Summary #2 of RDF formats
Learn more about JSON-LD
Linked Data Design Issues
Learn more about RDF (Book) - Practical RDF by Shelley Powers

Nice overview, at @doulikecookiedough .

For RDF representations, the other format we've used and that I find both readable and efficient is Turtle. N-Triples is a simnplified version of Turtle that removes most of what I like about the compact Turtle representation. Here's an example ontology we developed and use in Turtle format called ADCAD: https://github.com/NCEAS/adc-disciplines/blob/main/ADCAD.ttl I generally find Turtle syntax to be a lot more readable than N-triples, and all common RDF libraries know how to use it (it preceded N-triples). Here's an example:

odo:ADCAD_00040
    a owl:Class ;
    rdfs:label "Oceanography" ;
    rdfs:subClassOf odo:ADCAD_00038 ;
    skos:exactMatch <http://localhost/plosthes.2017-1#6456>, wikidata:Q43518 .

That said, JSON-LD will be even more readable for most folks.

In terms of use cases, I think there is some chance that client tools will need to be able to compose and submit these annotation files, which might be more readily done in JSON-LD as a convenient syntax. Processing will almost always involve conversion to an RDF model and use of a query language such as SPARQL, which is what our current indexer currently does with both JSON-LD and RDF/XML. As we expect each annotation file to be relatively small (a few to a dozen triples), the efficiency argument isn't such a big issue. But it would be nice for various tools to be able to easily create, serialize, and deserialize the files. JSON-LD has some nice lightweight libraries in most languages for this.

DataONEorg / hashstore

Annotation Design: N-Triple vs JSON-LD Discussion #60