antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.17k stars 3.28k forks source link

ATN.NextToken(state) should be memoized #4560

Closed kaby76 closed 7 months ago

kaby76 commented 7 months ago

I was going through a bug in Antlr4ng when I came across an observation. The computation for ATN.NextToken(ATNState) should be memoized. This method recursively goes through an ATN graph to find the list of tokens that are possible. The computation does not depend on anything but the ATNState number and context (which is null many times). But, interpreting what can be a valid token at a state is very expensive.

kaby76 commented 7 months ago

Seems it either doesn't happen very often, or it's already memoizing.

ericvergnaud commented 7 months ago

Were you able to dig into the ATNConfig hashCode caching topic ?

kaby76 commented 7 months ago

Hi Eric, Only a little. Indeed, I found out that nextToken() is called almost always with a unique state (and almost always with a null context). Jim is doing something cleaver because I never see a call to nextToken() with the same state number ever. I didn't pursue this any further, so that's why I closed this issue off. I do remember the old Antlr4cs fork that did work faster than the "Official" CSharp runtime, so that's something to still look into. But, I have other tasks to do (e.g., add performance testing with those nice little Octave bar graphs to compare before/after with a PR merge in grammars-v4, porting Saxon to C# for Trash, ...).