Open HSorensen opened 9 years ago
When reusing a compressed parse tree it can be significant faster to use the serialized parse tree instead of recreating the parse tree from the source code:
Using the Apache Commons Compression routines can yield really good compression ratio of 90%. Write compressed parse tree
parser.setBuildParseTree(true);
ParseTree tree = parser.program();
OutputStream xzOutputStream = new FileOutputStream("program.parsetree.xz");
CompressorOutputStream xzOut = new CompressorStreamFactory().createCompressorOutputStream(CompressorStreamFactory.XZ, xzOutputStream);
ObjectOutputStream xzObjectOutputStream = new ObjectOutputStream(xzOut);
xzObjectOutputStream.writeObject(tree);
xzObjectOutputStream.close();
and retrieve a compressed parse tree
InputStream inputStream = new FileInputStream("program.parsetree.xz");
CompressorInputStream input = new CompressorStreamFactory().createCompressorInputStream(CompressorStreamFactory.XZ, inputStream);
ObjectInputStream objectInputStream = new ObjectInputStream(input);
ParseTree tree=(ParseTree) objectInputStream.readObject();
objectInputStream.close();
Parsing Tree serialization to JSON, XML and other formats is a good idea. One of use cases is unit testing. For example: In my current work project I have several stages of source code processing:
At present parsing stage have a lowest performace (for ANTLR) сompared to other stages. And If I want to test PM stage, I still have to parse source code. It will be great to serialize parsing tree one time and use it for second or third stage testing many times.
@KvanTTT Currently only Java is supported, I would be interested to know how it would work for you. Do you have a setup where it is possible to change the antlr runtime for the Java part of your parsing?
This seems like a good idea. I too am finding that my JUnit test cases are running slowly because my parse (of some existing code I'm trying to analyse) takes 70s. Walking the resulting parse tree has sub-second response times.
Any idea if these changes to the Java classes will be incorporated into the release build?
I can probably prepare a release build of my ANTLR fork, it is based on mainstream v4.5.x
thanks, that's great
@parrt or @sharwell would it be possible to host a "Community Build" somewhere under the antlr.org web-pages, maybe have a "Community Builds" section under the download page ? Also the http://www.antlr.org/download/index.html page could have a sub-directory communitybuilds. Any other name may work of course."Experimental Builds" could also work. Thoughts ?
Bump, I'm also interested in this feature from a unit-testing perspective. I'm not sure I want to go off-mainline for a serializable parse tree, but exploring the concept of reducing the parse tree to something like XML with some level of built-in facilities would be very useful for testing.
I should mention I did accidentally point XStream at an object that had a reference to the root of a parse tree, causing XStream to serialize the whole thing, and it actually produced some not-terrible XML.
Have you considered instead serializing the sequence of decisions made by the parser's prediction algorithm? Deserialization would consist of parsing using a modified ParserATNSimulator
which replays those values from calls to adaptivePredict
(via a queue).
Sam, what you suggest sounds like a more portable approach. can you outline where to do this and how to replay the sequence to recreate the parsetree.
Henrik
On Friday, August 5, 2016, Sam Harwell notifications@github.com wrote:
Have you considered instead serializing the sequence of decisions made by the parser's prediction algorithm?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/1006#issuecomment-237836672, or mute the thread https://github.com/notifications/unsubscribe-auth/AEztNSyxrVVpPwUAxTFEQphBofgARmrgks5qcystgaJpZM4GGOTA .
Hi,
As part of a project I'm working on, I wrote an XML serializer for an ANTLR4 parse tree. In summary, this is how it works:
I feel it would be useful to have the file name (associated with the parse tree) in the top-level node, or some way of refering back to the document that was parsed (I need this in my project, but I think other people may find that useful too).
The line and offset attributes discussed above may be optional, but I believe they are useful because they allow to measure the "source"-size that a subtree carries, and by that I mean, how much of the actual source document(that was parsed) does a subtree describe.
Best regards, Stefan
Is it possible to use JSON instead of verbose XML?
Stefan, do you also have code that can take the XML and recreate a ParseTree ?
Henrik
On Mon, Sep 5, 2016 at 4:48 AM, Stefan Corneliu Petrea < notifications@github.com> wrote:
Hi,
As part of a project I'm working on, I wrote an XML serializer https://github.com/wsdookadr/mdetect/blob/master/src/main/java/com/mdetect/ParseTreeDOMSerializer.java for a parse tree. In summary, this is how it works:
- every node(or parse rule if you will) in the parse tree corresponds to an XML node in the serialized XML form
- a stack is used(and updated along the traversal) to track the parent of the node that's about to be inserted into the XML document
- attributes involving start and end line number are inserted for node
- attributes such as start and end offset of rule should be added (not in place at this time, may be added later on)
I feel it would be useful to have the file name (associated with the parse tree) in the top-level node, or some way of refering back to the document that was parsed (I need this in my project, but I think other people may find that useful too).
Best regards, Stefan
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/1006#issuecomment-244728177, or mute the thread https://github.com/notifications/unsubscribe-auth/AEztNX31d42B1X0jgmAA6Bowo7_G78NRks5qnAGNgaJpZM4GGOTA .
@HSorensen I don't
Maybe generate a json like parse tree data, and wrap it into a ParseTree class. A ParseTree walk through the json data internal, and have a create method use this json data. And the json data should be compatible to all target language ParseTree class.
I was looking into serializing the ParseTree and much to my surprise only 7 classes need to implement the Serializable interface:
With these changes it becomes straight forward to store the parse tree:
and retrieving the parse tree
Some background: I need to run several static source code analysis using many different tree visitors and tree walkers, some of the analysis if done in foreground other in background. The language I am parsing supports inclusion of source files, and it is common to have literally 100's of files that all need to be parsed together, with the serialized parse-tree there is just one file to maintain. Alternatively to serializing the parse tree I would have to maintain all these files together and recreate the parse tree every time I need it.
I realize that this feature probably make most sense for the use case that need to support inclusion of source code see my ANTLR fork or commit ddf9a331
That said if there is interest I will prepare a new branch and pull-request including test cases etc.