Open ktk opened 6 years ago
Thanks / you're welcome :)
Good idea. I've had the same thought, actually, though I was thinking of taking it a bit further and allow general patterns (or at least subject and predicate as well). Not sure about blanks... maybe it could just not support them, or do a simple string match which would still be handy.
The idea of bloating the still very "do one thing and do it well" serdi
concerns me a bit, but a separate rdf_grep
sort of thing would mostly be the same program with some filtering stuff added anyway, so I guess that doesn't make sense.
Might need to break the API to do this well, though I'm not sure, and I think it's time to break it and clean some things up for the next major version anyway.
The rdf_grep
(or tgrep
for triple grep?) idea is tempting, I like the idea of not bloating serdi. Streaming is essential for the datasets I use it for.
For grepping I use '<...> <' patterns but that does not work for everything and it's not very nice to write.
Added this in the serd1
branch with https://github.com/drobilla/serd/commit/116c73a7886e281ef013a3d658c76c6a32160fb4 if you want to give it a shot.
Could still use a bit of polish (and command line flags are starting to run out), but seems to do the job.
It borrows a shred of SPARQL so you can write an NQuads statement with ?variable
syntax and use it with the -g
option (for "grep"), e.g.
serdi -g '?s ?p ?o <http://example/g> .' tests/NQuadsTests/nq-syntax-uri-01.nq
Excellent, I will give it a try thanks! The SPARQL approach makes a lot of sense to me.
It seemed weird to only have "inclusive" or "exclusive" (like grep -v
) filtering, so I changed this to two separate flags: -F
and -G
(roughly for "filter" and "grep", respectively) in the latest version (on branch serd1-meson
) that should hopefully see release soon.
As it happens the (also new) validation checks had the same problem, and I'm increasingly worried about the rapidly disappearing flag characters, but it seemed questionable to invent some kind of odd command line syntax (like a universal negation flag that negates the thing after it), so I guess this will have to do until some future even fancier future version forces the issue.
First, thanks for serdi, very nice & fast library!
I often use it in RDF creation pipelines and one job I do on a regular base is to convert quads to triples. Sometimes the graph does not matter but in other cases it would be nice to be able to specify which graph I want to have in the output and throw away the rest (or vice versa).
I currently do this with pipe-filters but I would feel more comfortable when I could add that as parameter to serdi.