Open leostera opened 5 years ago
There's no AST built for Turtle or N-Triples/N-Quads. Lagra de-serialises those direct to a triple store, where you can use lagra:find_all_t/2 (or lagra:find_all_q/2) to get triples (or quads) from it. (I'm going to add some prettier helper functions at some point, but those two should be ugly but sufficient for now).
The choice of not generating a complete AST is quite deliberate -- in using most RDF tools, I've found that they have a tendency to load a whole serialised document into RAM, which tends to cause major issues when dealing with large datasets. Part of the design choice is to make it possible to load very large RDF documents and dump the triples in an external data store without having to have insane amounts of RAM. To that end, the Turtle and N-Triples/N-Quads parsers are completely hand-coded and fully incremental parsers, which generate triples one at a time as they are identified, without storing the whole state of the input file as an AST.
At the moment, of course, the only store is the trivial
one, which is all in-memory and very inefficient, so I'd expect it to become useless as a result of poor performance well before it becomes useless as a result of insufficient RAM...
Right! I understand. Do you have any interest in reusing the lexing/parsing to build an AST? I could help contribute that.
But perhaps I am wrong — I am not looking into getting an AST describing particular entities but rather the core entity classes and their relationships. So in general, considering the size of the data represented by an ontology, the ontology itself should be reasonably small.
I think I'm confused about what you're trying to do, here.
RDF itself has no concept of an AST. There's a fundamental data model, which is the graph. The graph is typically represented (and manipulated) as a set of triples. There are many different renderings of a graph possible -- multiple different serializations, and each of those can render a graph in many different ways. It's just not useful to talk about the AST of one of those specific serialization formats, unless you're implementing a deserializer.
It sounds more like you want to be able to extract the resources of type rdfs:Class
and do stuff with those? Possibly looking at the rdfs:subclassOf
relationships between them? You don't need an AST for that -- it's just querying the triples with find_all_t/2
...
Hello again!
I'm playing around with the API, trying to parse a small turtle file and get my hands on the AST to do some code generation.
However, I can't seem to extract it from the store after the
lagra_parser_turtle_parser
has written to it. Do you have any pointers for me to hack on?Thanks again for the work on
lagra
🙌