antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.16k stars 3.28k forks source link

[Question] Why is the ATN of the following lexer rules different from the javadoc of ATNState #3297

Closed chengniansun closed 3 years ago

chengniansun commented 3 years ago

I have the following toy lexer

fragment DIGIT
    : [0-9]
    ;

NaturalNumber
    : DIGIT+
    ;

According to the javadoc of ATNState, the ATN of NaturalNumber should start with RuleStartState --> PlusBlockStartState.

But In fact, a path in the ATN looks similar to

RuleStartState -> BasicState -> SetTransition[0-9] ->RuleStopState -> BlockEndState 
          ^                                                               |
          |                                                               |
  BasicState <----------------------  PlusBlockStartState <- ... <- PlusLoopbackState

Could someone advise why the ATN is changed in this way, and is it possible to get a canonical ATN as described in the javadoc of ATNState?

Thanks in advance.

ericvergnaud commented 3 years ago

Please move this to the google discussions group

sharwell commented 3 years ago

It seems like it would be better to convert this to a GitHub discussion instead of moving it to a completely different medium.

image

image

sharwell commented 3 years ago

@chengniansun How did you determine the actual shape of the ATN? I would not expect it to deviate from the form shown in the documentation.

chengniansun commented 3 years ago

Hi @sharwell

I used a debugger to manually check the shape of the ATN. My understanding of the javadoc is that the indegree of RuleStartState and the out-degree of RuleStopState should be both 0. But in the ATN above, there is outgoing transition from the RuleStopState, and incoming transition into the RuleStartState.

If I remove the keyword fragment from DIGIT, then the shape of the ATN becomes consistent with the javadoc.

ericvergnaud commented 3 years ago

It seems like it would be better to convert this to a GitHub discussion instead of moving it to a completely different medium.

image

image

Except most people kind enough to provide support only watch the google group. Also not sure we want to tie ourselves to GitHub that much...

chengniansun commented 3 years ago

Thanks. I moved the question to the discussion group.

sharwell commented 3 years ago

Interesting. I would have expected it to have this:

RuleStartState -> PlusBlockStartState -> RuleTransition -> BlockEndState -> PlusLoopbackState -> LoopEndState -> RuleEndState
                  ^                                                                         /
                  \------------------------------------------------------------------------/

I believe the outgoing transitions from RuleEndState are return state transitions (tail end of RuleTransition from other rules).

chengniansun commented 3 years ago

So, I can assume that RuleStartState has no incoming transitions, and RuleStopState has no outgoing transitions, which follows the javadoc of ATNState.

sharwell commented 3 years ago

Right, and one subset of your original graph is what I would expect for the DIGIT rule:

RuleStartState -> BasicState -> SetTransition[0-9] ->RuleStopState

It makes me wonder if there was a mix-up in the RuleStartState during debugging.

KvanTTT commented 3 years ago

Except most people kind enough to provide support only watch the google group.

But some people don't watch google group.

Also not sure we want to tie ourselves to GitHub that much...

We have discussion: https://github.com/antlr/antlr4/issues/3022. As I understand @parrt has no objection against google group.

chengniansun commented 3 years ago

tate tran

So, for the lexer rule NaturalNumber, there will be no RuleStopState which has no outgoing transitions.