blake-regalia / graphy.js

A collection of RDF libraries for JavaScript
https://graphy.link/
ISC License
161 stars 3 forks source link

How to iterate over a graphy RDF Dataset? #24

Closed happybeing closed 4 years ago

happybeing commented 4 years ago

@blake-regalia I'm trying to make use of a graphy RDF dataset and think I'm confused how to use it and whether I have access to a Dataset interface or only DatasetCore. I see Dataset has various iterators (map, reduce, forEach) that I don't see on the object made using RdfDataset() (where const RdfDataset = require('@graphy/memory.dataset.fast');. So I think I have DatasetCore, but am not sure if I also have Dataset.

You API docs do refer to * [Symbol.iterator]() per @RDFJS/dataset (here) but I don't understand what that means. The Rdfjs spec isn't clear enough for me here.

Can you tell me kinds of iteration are supported in graphy and can you give some examples?

Is there a way to access the Dataset interface or only DatasetCore?

By the way, the link to @RDFJS/dataset in your API Docs is broken, and should point to https://rdf.js.org/dataset-spec/#datasetcore-interface but is in fact https://rdf.js.org/dataset-spec/dataset-spec.html#datasetcore-interface

blake-regalia commented 4 years ago

RDFJS Dataset interface is in experimental phase and some new retrieval functions such as filter and forEach have made their way into the spec after I wrote the implementation. However, these are still being discussed by the community whether or not they really belong anyway since they are somewhat pointless (e.g., developers can simply filter within iterators).

The simplest way to iterate all the quads within the FastDataset is by using the standard iteration protocol. The not-so-obvious part may be that if you are loading quads into the dataset using streams, you will have to wait for that process to complete by attaching an event listener. Here's an example in node.js that should translate to the browser using the methods we discussed earlier:

const Dataset = require('@graphy/memory.dataset.fast');
const ttl_read = require('@graphy/content.ttl.read');

// create a new dataset instance
let k_dataset = Dataset();

// pipeline
process.stdin
    // read turtle
    .pipe(ttl_read())

    // load all triples into dataset
    .pipe(k_dataset);

// wait for the Dataset's @nodejs/Transform superclass to emit the 'finish' event
k_dataset.on('finish', () => {
    // do something fancy like canonicalize the dataset
    k_dataset.canonicalize();

    // here is the iteration part, you can do this as many times as you'd like
    for(let y_quad of k_dataset) {
        console.warn(y_quad.verbose());
    }

    // print dataset size
    console.warn(`== ${k_dataset.size} quads ==`);
});

There is also a more advanced method that is not currently documented since it needs more development and testing which involves iterating a forest of trees making up the quads by graph/subject/predicate/object. This is much faster and more memory efficient than iterating one quad at a time but will also take a bit of energy to convince the rest of the community to adopt into the standard >_>

blake-regalia commented 4 years ago

P.S. thanks for the notice about the URLs, I've updated the documentation.

happybeing commented 4 years ago

Thanks Blake, perfect answer. I see now my being confused over JS idioms held me back here (and not knowing the terminology). I appreciate you taking the time to spell it out.