drobilla / serd

A lightweight C library for RDF syntax
https://gitlab.com/drobilla/serd
ISC License
86 stars 15 forks source link

Filter specific graph in nquads #12

Open ktk opened 6 years ago

ktk commented 6 years ago

First, thanks for serdi, very nice & fast library!

I often use it in RDF creation pipelines and one job I do on a regular base is to convert quads to triples. Sometimes the graph does not matter but in other cases it would be nice to be able to specify which graph I want to have in the output and throw away the rest (or vice versa).

I currently do this with pipe-filters but I would feel more comfortable when I could add that as parameter to serdi.

drobilla commented 6 years ago

Thanks / you're welcome :)

Good idea. I've had the same thought, actually, though I was thinking of taking it a bit further and allow general patterns (or at least subject and predicate as well). Not sure about blanks... maybe it could just not support them, or do a simple string match which would still be handy.

The idea of bloating the still very "do one thing and do it well" serdi concerns me a bit, but a separate rdf_grep sort of thing would mostly be the same program with some filtering stuff added anyway, so I guess that doesn't make sense.

Might need to break the API to do this well, though I'm not sure, and I think it's time to break it and clean some things up for the next major version anyway.

ktk commented 6 years ago

The rdf_grep (or tgrep for triple grep?) idea is tempting, I like the idea of not bloating serdi. Streaming is essential for the datasets I use it for.

For grepping I use '<...> <' patterns but that does not work for everything and it's not very nice to write.

drobilla commented 4 years ago

Added this in the serd1 branch with https://github.com/drobilla/serd/commit/116c73a7886e281ef013a3d658c76c6a32160fb4 if you want to give it a shot.

Could still use a bit of polish (and command line flags are starting to run out), but seems to do the job.

drobilla commented 4 years ago

It borrows a shred of SPARQL so you can write an NQuads statement with ?variable syntax and use it with the -g option (for "grep"), e.g.

serdi -g '?s ?p ?o <http://example/g> .' tests/NQuadsTests/nq-syntax-uri-01.nq
ktk commented 4 years ago

Excellent, I will give it a try thanks! The SPARQL approach makes a lot of sense to me.

drobilla commented 3 years ago

It seemed weird to only have "inclusive" or "exclusive" (like grep -v) filtering, so I changed this to two separate flags: -F and -G (roughly for "filter" and "grep", respectively) in the latest version (on branch serd1-meson) that should hopefully see release soon.

As it happens the (also new) validation checks had the same problem, and I'm increasingly worried about the rapidly disappearing flag characters, but it seemed questionable to invent some kind of odd command line syntax (like a universal negation flag that negates the thing after it), so I guess this will have to do until some future even fancier future version forces the issue.