Expose AST of Turtle file

leostera commented 5 years ago

Hello again!

I'm playing around with the API, trying to parse a small turtle file and get my hands on the AST to do some code generation.

However, I can't seem to extract it from the store after the lagra_parser_turtle_parser has written to it. Do you have any pointers for me to hack on?

Thanks again for the work on lagra 🙌

darkling commented 5 years ago

There's no AST built for Turtle or N-Triples/N-Quads. Lagra de-serialises those direct to a triple store, where you can use lagra:find_all_t/2 (or lagra:find_all_q/2) to get triples (or quads) from it. (I'm going to add some prettier helper functions at some point, but those two should be ugly but sufficient for now).

The choice of not generating a complete AST is quite deliberate -- in using most RDF tools, I've found that they have a tendency to load a whole serialised document into RAM, which tends to cause major issues when dealing with large datasets. Part of the design choice is to make it possible to load very large RDF documents and dump the triples in an external data store without having to have insane amounts of RAM. To that end, the Turtle and N-Triples/N-Quads parsers are completely hand-coded and fully incremental parsers, which generate triples one at a time as they are identified, without storing the whole state of the input file as an AST.

At the moment, of course, the only store is the trivial one, which is all in-memory and very inefficient, so I'd expect it to become useless as a result of poor performance well before it becomes useless as a result of insufficient RAM...

leostera commented 5 years ago

Right! I understand. Do you have any interest in reusing the lexing/parsing to build an AST? I could help contribute that.

But perhaps I am wrong — I am not looking into getting an AST describing particular entities but rather the core entity classes and their relationships. So in general, considering the size of the data represented by an ontology, the ontology itself should be reasonably small.

darkling commented 5 years ago

I think I'm confused about what you're trying to do, here.

RDF itself has no concept of an AST. There's a fundamental data model, which is the graph. The graph is typically represented (and manipulated) as a set of triples. There are many different renderings of a graph possible -- multiple different serializations, and each of those can render a graph in many different ways. It's just not useful to talk about the AST of one of those specific serialization formats, unless you're implementing a deserializer.

It sounds more like you want to be able to extract the resources of type rdfs:Class and do stuff with those? Possibly looking at the rdfs:subclassOf relationships between them? You don't need an AST for that -- it's just querying the triples with find_all_t/2...

darkling / lagra

Expose AST of Turtle file #3