blake-regalia / graphy.js

A collection of RDF libraries for JavaScript
https://graphy.link/
ISC License
161 stars 3 forks source link

Is there a way to use the embedded blank nodes form in when serializing to Turtle or TriG? #39

Open EmmanuelOga opened 3 years ago

EmmanuelOga commented 3 years ago

Hi,

I'm trying to pretty-print a NQuads file:

cat .\file.nq | yarn run graphy read -c nquads / tree / write -c ttl

I was hoping to be able to serialize into a format like Turtle or TriG with embedded blank nodes, which are a lot easier to grok than the expanded form. So instead of:

_:b0 <url:prop1> _:b1 .
_:b1 <url:prop2> "something" .

... I was expecting to get:

_:b0 <url:prop1> [ _b1 <url:prop2> "something" ] .

Is there a way to produce the later form with graphy?

Thanks!

P.S.: in practice, I'm generating the nquads from some JSON-LD provided by a library like this. If you can recommend a nicer way to go from json-ld to pretty-printed TriG it would solve my actual problem more directly. Just thought I would mention this šŸ˜„

P.S.2: I don't think graphy does this right now, but how crazy would it be to "serialize" to C3? The use case it to grab some example JSON-LD from a 3rd party, and convert it to JavaScript. After that I could wrap the whole thing into a function and generate parameterized JSON-LD in no time!

blake-regalia commented 3 years ago

Thanks for the issue, it is an interesting topic.

In order to guard against the case where a blank node is serialized anonymously but later referenced, graphy will only serialize blank node property lists in the form [] :p1 :o1 if using an EphemeralBlankNode (needed for bnode subjects in c3), or the nested object structure in concise quads for nested objects, the combination which looks like this:

const graphy = require('@graphy/core.data.factory');
const turtle_writer = require('@graphy/content.ttl.write');

const ds_out = turtle_writer({
  prefixes: {
    rdf: 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
  },
});

ds_out.pipe(process.stdout);
ds_out.write({
  type: 'c3',
  value: {
    [graphy.ephemeral()]: {
      '>url:prop1': {
        '>url:prop2': '"something',
      },
    },
    '>url:collection': {
      '>url:looksLike': [  // object list
        [  // RDF collection
          {  // blank node
            a: '>url:Thing',
            '>url:value': '"Hello,',
          },
          {  // another blank node
            a: '>url:Thing',
            '>url:value': '@en"World.',
          },
        ],
      ],
    },
  },
});

Outputs:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

[] <url:prop1> [
    <url:prop2> "something" ;
  ] .

<url:collection> <url:looksLike> (
    [
      rdf:type <url:Thing> ;
      <url:value> "Hello," ;
    ]
    [
      rdf:type <url:Thing> ;
      <url:value> "World."@en ;
    ]
  ) .

I don't think graphy does this right now, but how crazy would it be to "serialize" to C3?

This has been brought up before but I am still not sure a serializer/deserializer for c3 would be needed. Assuming you are only using strings, booleans and numbers (and nested arrays or objects composed of those primitives), then the whole payload can be serialized as JSON. If you are using references or special datatypes such as Date, then vanilla JS can also be used to reconstruct the c3 object in memory. What would be the role of a special serializer/deserializer?

EmmanuelOga commented 3 years ago

What would be the role of a special serializer/deserializer?

I'm not sure if you are talking about a special serializer that recognizes blank nodes the way I need them to, or a special deserializer to 'c3'.

In any case, my use cases are around grabbing a large serialized blob of triples and express it in such way that is:

My current source of examples is a library that generates json-ld, but graphy doesn't read json-ld, so I turned them into nquads using the n3 npm package.

An example

Say I had these turtle ... or even its nquads form, which libraries more often generate.

<url:collection> <url:looksLike> (
    [rdf:type <url:Thing> ; <url:value> "Hello," ; ]
    [rdf:type <url:Thing> ;<url:value> "World."@en ;]) .

I want to automatically obtain:

{type: 'c3', value: {
    '>url:collection': {
      '>url:looksLike': [ 
        [ { a: '>url:Thing',  '>url:value': '"Hello,'},{ a: '>url:Thing', '>url:value': '@en"World.'},],],},},}

Why? Because it gives me something really easy to turn into a JavaScript funciton like this:

function generateCollection(values) {
  const urls = values.map(val => { a: '>url:Thing',  '>url:value': `"${val}`});
  return {type: 'c3', value: {'>url:collection': {'>url:looksLike': urls}}};
}

The fastest way for me to generate a certain shape of triples is by copying (often very large) existing example of the serialized triples (often full of blank nodes that are only used once).

If I had the c3 version generated for me it would be very easy to go from "gigantic example blob of triples" to "parameterizable triple generator".

To be extremely specific, my problem is: given this example, that describes an specific payload for a server request, I want to be able to generate similar payloads with different parameters (but I also want the tersest turtle serialization possible which is also the prettiest and easiest to read).

blake-regalia commented 3 years ago

The simplest approach to get what you want would be to introduce a new feature to the Dataset class that computes blank node closure (i.e., deduce which blank nodes are used in the object position at most once) and then just dump or print the contents with some slight pruning since it is already using a c3 data structure internally to store the quads. This would still be a bit of work but is the best solution I can think of at the moment. In the meantime, hopefully the documentation provides enough clarity on how to use c3 for the pretty-printing you're after.

tpluscode commented 3 years ago

In case you're interested, I implemented a transform stream which consumes an RDF/JS stream first building up a c4(r) structure and passing that to graphy: TransformToConciseHash

It takes care to only embed blank nodes which are used only once. Otherwise they will be serialised as _: identifiers

Might export as a package of its own if you'd like.