filip26 / titanium-json-ld

A JSON-LD 1.1 Processor & API
https://apicatalog.com
Apache License 2.0
134 stars 33 forks source link

Expose the prefixes found in the top level @context, including remote @context. #193

Closed afs closed 4 months ago

afs commented 3 years ago

Prefixes have no standing in the RDF data model but they are convenient for display of URIs.

Describe the solution you'd like Expose the compact URI prefix mapping from the top-level @context, maybe a method RdfDataset.prefixes() that returns a Map<String, String>. This would be limited to the prefixes from the top level @context, the active context in-scope at the end of parsing the top level JSON after any nested local context have dropped out-of-scope.

Describe alternatives you've considered Secondary parsing at the JSON level of the JSON Document (this is what Jena v4.2.0 does). This does not included remote @context as it would require re-downloading the URL or interacting with any context cache.

Jena also requires the prefix URI to end in "/", "#" or ":" and Jena includes @vocab as prefix "". There are pragmatic Jena decisions that could be applied to the Map returned by Titanium.

Additional context This came up as part of JENA-2187.

filip26 commented 3 years ago

Hi @afs, thank you for reporting that. Please help me understand the issue in order to prepare test cases.

Do I understand it right that the goal is to generate RDF Turtle from a given JSON-LD input?

The JSON-LD to RDF algorithm expands an input and the expanded input (all prefixes lost after this step) is converted into node map. So I'm thinking that maybe we could somehow utilize a compaction algorithm to get prefixed output, or just the prefixes.

afs commented 3 years ago

Hi @filip26,

Turtle output is one use; there are several different Turtle output formats from "pretty" to a one quad-one line form which is "N-Quads+prefixes". Output does not happen when the JSON-LD is read in - the steps are read in, store, (later) write out.

Other uses include converting URIs to convenient string for UI display is another. In Jena, the dataset is the storage unit and it carries with it some prefixes.

The prefixes normally come from the files parser to build the dataset.

The process of going from Titanium to Jena is:

private void read(Document document, StreamRDF output, Context context) throws Exception {
        // JSON-LD to RDF
        RdfDataset dataset = JsonLd.toRdf(document).get();
        extractPrefixes(document, output::prefix);
        JenaTitanium.convert(dataset, output);
    }

https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/riot/lang/LangJSONLD11.java

StreamRDF is the abstraction for sending parser output.

output is typically writing into a Jena DatasetGraph - the storage abstraction.

DatasetGraph has a method prefixes() to return the prefixes carried by the dataset.

For:

{
    "@context": {
    "@version": 1.1,
    "foaf" : "http://xmlns.com/foaf/0.1/",
    "skos" : "http://www.w3.org/2004/02/skos/core#"
    }
}

I was hoping to have RdfDataset provide a map "foaf" -> "http\://xmlns.com/foaf/0.1/" , "skos" -> "http\://www.w3.org/2004/02/skos/core#".

Conversion between systems: https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/riot/system/JenaTitanium.java

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment!

filip26 commented 2 years ago

How to deal with conflicting prefixes?

e.g.

{
  "@context": {
    "name": "http://example.com/person#name",
    "details": "http://example.com/person#details"
  },
  "name": "Markus Lanthaler",
  "details": {
    "@context": {
      "name": "http://example.com/organization#name"
    },
    "name": "Graz University of Technology"
  }
}

converted into n-quads

_:b0 <http://example.com/person#details> _:b1 .
_:b0 <http://example.com/person#name> "Markus Lanthaler" .
_:b1 <http://example.com/organization#name> "Graz University of Technology" .

What keys should contain the prefix map?

afs commented 2 years ago

"name": "http://example.com/person#name" isn't really a prefix - it's a short name for a URI. Prefixes appear in Turtle as prefix:localName which is more like: "person": "http://example.com/person#" and then person:name

Those can be nested as well so there is a decision point here. There isn't a wrong answer.

RDF/XML can have nested xml namespaces declarations (the XML equivalent of prefixes). It is quite unusual to see nested XML namespaces in RDF/XML - I think they would be more common in JSON-LD.

JSON is slightly different to XML because XML is parsed in encounter order and JSON is a map.

Possibility 1: ignoring the inner @context and only expose the document-wide declarations. Possibility 2: slightly more complicated is "put in as nested - outer overrides inner"

It probably makes sense for the outer, document definition to be in the final outcome.

HTH

filip26 commented 2 years ago

if the given example should produce prefix map like this one:

{ 
  "person":  "http://example.com/person#", 
  "organization":  "http://example.com/organization#"
}

then we have to develop an algorithm for extracting and naming prefixes from JSON-LD context. Perhaps, we could start with a map of well known prefixes (foaf, skos, ...).

The other options is to generate prefix map from N-Quads using a part of URL as prefix name.

filip26 commented 2 years ago

Just an aside note: from another point of view; as I understand prefixes are about readability. Thus in some cases it would be more beneficial to a consumer to provide its own list of well known prefixes in order to get an easily readable output.

afs commented 2 years ago

Yes. The user can add them to the Jena graph for example, or even read a Turtle file which only has prefixes. This happens when loading N-triples - no prefixes, but common for large database dumps - and the user wants to get some nicer output.

filip26 commented 2 years ago

I'm preparing a low level JsonLdProcessor API that will allow you to grab a context or/and optimize processing. Target version is 1.3.0

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment!

filip26 commented 4 months ago

V2 has been canceled because of lack of funding.

afs commented 4 months ago

Sad to hear that v2 is cancelled.

filip26 commented 4 months ago

@afs I'm sorry, but I have no other option. I hear Titanium has millions production installations in total from various companies, but none is willing to pay a few $ back.

hmottestad commented 4 months ago

I'm also sorry to hear that v2 has been canceled.