antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
16.95k stars 3.26k forks source link

Go is not threadsafe / leaks state between Parser runs #2040

Open bramp opened 6 years ago

bramp commented 6 years ago

I've compiled the mysql parser for Go, and when testing, I noticed my tests would sometimes fail depending on what tests occurred before it. I have a short example here: https://gist.github.com/bramp/625e1a149703710cd3367ea09814e21f

In the example I create a new Lexer and Parser for each test, and if parse "grammars-v4/mysql/examples/ddl_create.sql" or "grammars-v4/mysql/examples/dml_delete.sql" on their own, ANTLR does so without error. But if I parse both of them in the same test (even when using separate Lexer/Parser instances) I get syntax errors on whichever is the 2nd run, for example:

--- FAIL: TestMySqlParser (1.08s)
        mysql_simple_test.go:65: [1] grammars-v4/mysql/examples/dml_delete.sql: SyntaxError 9:108: mismatched input 'left' expecting {<EOF>, '-'}

I track this down to a few bits of shared global state (such as decisionToDFA) that are being altered between runs, and that would change the behaviour of later parser runs. This in itself seems a problem, since the parse should act the same for the same input, but because decisionToDFA is modified it leaks state into the next run.

Also, when I change my test to run all the examples concurrently, and use the go test -race it shows multiple race conditions where data is not being locked correctly, inside decisionToDFA, BasicBlockStartState and and IntervalSet.

I'm going to investigate fixing this, but opening a bug to keep track.

bramp commented 6 years ago

Just FYI, if I modify the Lexer and Parser, and move the creation of lexerAtn, lexerDecisionToDFA into NewMySqlLexer, and deserializedATN and decisionToDFA into the NewMySqlParser, then state isn't leaked between Parsers, and my tests all pass. That does seem suboptimal.

mdakin commented 6 years ago

I think same happens for Java as well (not 100% sure, but remember hitting this while manually testing consecutive runs with a lexer)

sharwell commented 6 years ago

I think same happens for Java as well

Unless you are misusing something, the Java releases (both the reference one and my fork) and my C# release should be thread safe. All three have been heavily tested in highly concurrent environments. The exception is the clearDFA() method, which was created more for testing grammars than a production-use method.

mdakin commented 6 years ago

@sharwell Agreed, either I forgot to call reset on the lexer or something like that.

jaystarshot commented 4 years ago

Just FYI, if I modify the Lexer and Parser, and move the creation of lexerAtn, lexerDecisionToDFA into NewMySqlLexer, and deserializedATN and decisionToDFA into the NewMySqlParser, then state isn't leaked between Parsers, and my tests all pass. That does seem suboptimal.

@bramp Doesn't this cause stack overflow due to excess memory usage?

jaystarshot commented 4 years ago

@sharwell I am getting a lot of Data Races especially in the github.com/antlr/antlr4/runtime/Go/antlr.(*IntervalSet).addInterval() at /runtime/Go/antlr/interval_set.go:27 and github.com/antlr/antlr4/runtime/Go/antlr.(*IntervalSet).contains() at runtime/Go/antlr/interval_set.go:21 for example. This is a Data Race Warning. I am using the init function using to be executed only once from sync.Once() package, Is this a similar bug Or Do we need another issue for this bug?