DiSSCo / SDR

Specimen Data Refinery
Apache License 2.0
6 stars 0 forks source link

Evaluate and choose a JSON parsing tool #20

Closed PaulBrack closed 2 years ago

PaulBrack commented 3 years ago

Need to choose a tool that can:

PaulBrack commented 2 years ago

There's been a change of approach from when we wrote this original story - we're expecting a lot more of the data type now and not explicitly including the JQ JSON parser in workflows like the dummy POC.

To discuss - what functionality is inherent in the OpenDS data type?

  1. Create a fresh JSON-LD object Don't expect the data type to be able to do this

  2. Read JSON-LD The data type should be able to do this?

  3. Validate JSON-LD Should this be functionality of the data type? What level of validation - certainly if we don't have valid JSON the data type will fail

  4. Query JSON-LD (including querying on values) The data type should be able to do this?

  5. Insert into existing JSON-LD object The data type should be able to do this?

infinite-dao commented 2 years ago

(1) https://json-ld.org/#developers may help to find tools, but I found no tool yet that can do all of above’s requirements as jack of all trades device ;-).

(2) Note: there are also Apache Jena’s command line conversion tools (https://jena.apache.org/documentation/io/ e.g. riot --validate --output=JSONLD mydata.ttl to get ntuples into jsonld converted or riot --validate --output=JSONLD ./json/example-rdfjson.rj to get RDF/JSON converted). Normally these tools deal with formats of RDF/XML, RDF/JSON, Turtle, TriX etc. and their conversions from one format into another, but for now (version 4.1.0) they have only JSONLD output, there is no direct input yet as far as I can see it. BTW the TriX format I find quite good for readability; the sparql tool could serve for queries. But in general this approach would operate on ntuples, RDF/XML, RDF/JSON, TriX and only the output would be in JSONLD. I know it would be better to have a tool operate on jsonld directly … but is there any yet?

Update: there is JSONLD input possible, but query tool (sparql) relies on RDF to query data. So there would only a solution with Apache Jena, if it is converted temporarily to RDF, at least to do queries on the data.

# Example from http://coldb.mnhn.fr/catalognumber/mnhn/ea/ea021512 as RDF source
download_document="coldb.mnhn.fr⁄catalognumber⁄mnhn⁄ea⁄ea021512.rdf"
wget --header="Accept: application/rdf+xml" http://coldb.mnhn.fr/catalognumber/mnhn/ea/ea021512 --output-document="${download_document}"
  # riot --validate "${download_document}" # simply validate RDF
  # 12:10:41 WARN  riot            :: [line: 2, col: 57] {W119} A processing instruction is in RDF content. No processing was done.
  # => it contains some XSLT it warns about

## Generate JSONLD out of RDF (in this example)
  turtle  --output=jsonld "${download_document}" > "${download_document}.jsonld" # RDF => JSONLD (OK)

## Can it read JSONLD?
  turtle  --output=rdfxml "${download_document}.jsonld" > "${download_document}.jsonld.rdf" # test JSONLD => RDF (OK)

## Validation of JSONLD
  riot --validate "${download_document}.modified.jsonld" # Modification: I removed one escaping double quote from a JSON string
  # 12:27:42 ERROR riot            :: [line: 3, col: 17] Unrecognized token 'http': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
  riot --validate "${download_document}.modified.jsonld" # Modification: I changed "@id" => "@ id"
  # nothing reported
PaulBrack commented 2 years ago

A lot of these requirements have been debunked - not using JSON-LD in the SDO, so closing this as existing json libraries appear to do the part