Closed afs closed 4 months ago
Hi @afs, thank you for reporting that. Please help me understand the issue in order to prepare test cases.
Do I understand it right that the goal is to generate RDF Turtle from a given JSON-LD input?
The JSON-LD to RDF algorithm expands an input and the expanded input (all prefixes lost after this step) is converted into node map. So I'm thinking that maybe we could somehow utilize a compaction algorithm to get prefixed output, or just the prefixes.
Hi @filip26,
Turtle output is one use; there are several different Turtle output formats from "pretty" to a one quad-one line form which is "N-Quads+prefixes". Output does not happen when the JSON-LD is read in - the steps are read in, store, (later) write out.
Other uses include converting URIs to convenient string for UI display is another. In Jena, the dataset is the storage unit and it carries with it some prefixes.
The prefixes normally come from the files parser to build the dataset.
The process of going from Titanium to Jena is:
private void read(Document document, StreamRDF output, Context context) throws Exception {
// JSON-LD to RDF
RdfDataset dataset = JsonLd.toRdf(document).get();
extractPrefixes(document, output::prefix);
JenaTitanium.convert(dataset, output);
}
StreamRDF
is the abstraction for sending parser output.
RdfNQuad
to Jena Quad
and send to output.output
is typically writing into a Jena DatasetGraph
- the storage abstraction.
DatasetGraph
has a method prefixes()
to return the prefixes carried by the dataset.
For:
{
"@context": {
"@version": 1.1,
"foaf" : "http://xmlns.com/foaf/0.1/",
"skos" : "http://www.w3.org/2004/02/skos/core#"
}
}
I was hoping to have RdfDataset
provide a map "foaf" -> "http\://xmlns.com/foaf/0.1/" , "skos" -> "http\://www.w3.org/2004/02/skos/core#".
Conversion between systems: https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/riot/system/JenaTitanium.java
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment!
How to deal with conflicting prefixes?
e.g.
{
"@context": {
"name": "http://example.com/person#name",
"details": "http://example.com/person#details"
},
"name": "Markus Lanthaler",
"details": {
"@context": {
"name": "http://example.com/organization#name"
},
"name": "Graz University of Technology"
}
}
converted into n-quads
_:b0 <http://example.com/person#details> _:b1 .
_:b0 <http://example.com/person#name> "Markus Lanthaler" .
_:b1 <http://example.com/organization#name> "Graz University of Technology" .
What keys should contain the prefix map?
"name": "http://example.com/person#name"
isn't really a prefix - it's a short name for a URI.
Prefixes appear in Turtle as prefix:localName
which is more like:
"person": "http://example.com/person#"
and then person:name
Those can be nested as well so there is a decision point here. There isn't a wrong answer.
RDF/XML can have nested xml namespaces declarations (the XML equivalent of prefixes). It is quite unusual to see nested XML namespaces in RDF/XML - I think they would be more common in JSON-LD.
JSON is slightly different to XML because XML is parsed in encounter order and JSON is a map.
Possibility 1: ignoring the inner @context
and only expose the document-wide declarations.
Possibility 2: slightly more complicated is "put in as nested - outer overrides inner"
It probably makes sense for the outer, document definition to be in the final outcome.
HTH
if the given example should produce prefix map like this one:
{
"person": "http://example.com/person#",
"organization": "http://example.com/organization#"
}
then we have to develop an algorithm for extracting and naming prefixes from JSON-LD context. Perhaps, we could start with a map of well known prefixes (foaf, skos, ...).
The other options is to generate prefix map from N-Quads using a part of URL as prefix name.
Just an aside note: from another point of view; as I understand prefixes are about readability. Thus in some cases it would be more beneficial to a consumer to provide its own list of well known prefixes in order to get an easily readable output.
Yes. The user can add them to the Jena graph for example, or even read a Turtle file which only has prefixes. This happens when loading N-triples - no prefixes, but common for large database dumps - and the user wants to get some nicer output.
I'm preparing a low level JsonLdProcessor API that will allow you to grab a context or/and optimize processing. Target version is 1.3.0
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment!
V2 has been canceled because of lack of funding.
Sad to hear that v2 is cancelled.
@afs I'm sorry, but I have no other option. I hear Titanium has millions production installations in total from various companies, but none is willing to pay a few $ back.
I'm also sorry to hear that v2 has been canceled.
Prefixes have no standing in the RDF data model but they are convenient for display of URIs.
Describe the solution you'd like Expose the compact URI prefix mapping from the top-level
@context
, maybe a methodRdfDataset.prefixes()
that returns aMap<String, String>
. This would be limited to the prefixes from the top level@context
, the active context in-scope at the end of parsing the top level JSON after any nested local context have dropped out-of-scope.Describe alternatives you've considered Secondary parsing at the JSON level of the JSON Document (this is what Jena v4.2.0 does). This does not included remote
@context
as it would require re-downloading the URL or interacting with any context cache.Jena also requires the prefix URI to end in "/", "#" or ":" and Jena includes
@vocab
as prefix "". There are pragmatic Jena decisions that could be applied to theMap
returned by Titanium.Additional context This came up as part of JENA-2187.