eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
360 stars 161 forks source link

JSON-LD 1.1 support #3654

Closed VladimirAlexiev closed 6 months ago

VladimirAlexiev commented 2 years ago

JSON-LD becomes more and more important, especially for Distributed Identifiers, Verifiable Credentials, IoT, etc.

Some initiatives are using JSON Schema to specify their JSONLD payload, which requires the ability to produce (write out) very precise JSONLD.

An examination of "JSON-LD" issues here shows a number if stalled issues and some bugs.

Some of them are due to this project (eg unable to specify context while writing/compacting), others are due to the underlying jsonld-java.

In particular, it's unclear whether jsonld-java will support 1.1. Two crucial 1.1 features are Framing and Scoped contexts.

I find https://github.com/jsonld-java/jsonld-java/pull/284#issuecomment-653521148 especially telling. Conformance test percentages are very low .

There's a suggestion to switch to https://github.com/filip26/titanium-json-ld. Its conformance test percentages are nearly perfect https://w3c.github.io/json-ld-api/reports/#subj_Titanium_JSON_LD_Java . A guy who seems to know stuff about JSONLD recommends it https://github.com/w3c/vc-data-model/issues/843#issuecomment-1024347690.

But a later comment states that jsonld-java is significantly more performant.

Please comment:

Cc @jeenbroekstra @JervenBolleman @fsteeg @ansell

VladimirAlexiev commented 2 years ago

jsonld-java also does not have canonization (URDNA2015) https://github.com/jsonld-java/jsonld-java/issues/249, which is important for crypto signing apps. Titanium has it (as an extension): https://github.com/filip26/titanium-json-ld#extensions

https://github.com/kbss-cvut/jb4jsonld/issues/37 also considers switching to titanium.

ansell commented 2 years ago

I haven't had any bandwidth to support JSONLD-Java and the 1.1 features have not been added, so no qualms with switching to a library that has 1.1 features implemented.

barthanssens commented 2 years ago

Would it be an option to be somewhat able to switch between both ? I.e. there could be a second rio module for the titanium json-ld library as long as they aren't included both at the same time, AND if some refactoring on the way RDF4J sets option part is done (e.g. #1755), it might provide a gentle upgrade path...

VladimirAlexiev commented 2 years ago

@barthanssens That's an option, as soon as the pros and cons of each alternative are clearly described.

On the topic of Parser performance (direction JSONLD->RDF):

Streaming may improve performance by significantly reducing required memory

VladimirAlexiev commented 2 years ago

Jena has integrated Titanium to a large degree: https://issues.apache.org/jira/browse/JENA-1948.

Update on https://github.com/umbreak/jsonld-benchmarks (great news!):

The Json-LD Java implementation is ~ 4.6 times faster in average than Titanium. In the current state (02.04.2022), the Titanium library is 2x faster than in its initial state (03.12.2020).

VladimirAlexiev commented 2 years ago

https://github.com/json-ld/yaml-ld/issues/20#issuecomment-1180180856 has some info on JSON-LD 1.1 conformance, including a summary table. image

Conformance leaders: Titanium (Java), JSON::LD (Ruby), PyLD (Python), jsonld.js (JavaScript)

VladimirAlexiev commented 2 years ago

We need some request headers (or criteria) to decide when fetching JSON-LD data from the repo:

This goes beyond jsonld 1.1 support and back to 1.0 support:

amivanoff commented 1 year ago

It looks like after latest improvements in Titanium (v. 1.3.2) the Json-LD Java implementation is only ~17% faster in average than Titanium. And in one test it even outperform Json-LD by 17%

hmottestad commented 1 year ago

It looks like after latest improvements in Titanium (v. 1.3.2) the Json-LD Java implementation is only ~17% faster in average than Titanium. And in one test it even outperform Json-LD by 17%

Which implementations are you comparing? And what are you comparing?

amivanoff commented 1 year ago

It looks like after latest improvements in Titanium (v. 1.3.2) the Json-LD Java implementation is only ~17% faster in average than Titanium. And in one test it even outperform Json-LD by 17%

Which implementations are you comparing? And what are you comparing?

It's the same test like in https://github.com/eclipse/rdf4j/issues/3654#issuecomment-1091454538

Test https://github.com/umbreak/jsonld-benchmarks but with newer Latest titanium-jsonld 1.3.2 from March 2023 and latest JSONLD-Java 0.13.4 from December 2021.

hmottestad commented 1 year ago

I'm working on contributing some performance optimisations to Titanium JSON-LD. Based on my single benchmark file with 600 000 triples I've currently managed to improve JSON-LD to RDF conversion by 3x, expanding by almost 2x and flattening by 4x.

barthanssens commented 1 year ago

Impressive !