antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.27k stars 3.29k forks source link

[feature] First class interspersion support. #4167

Closed modulovalue closed 10 months ago

modulovalue commented 1 year ago

This is not a bug report, I just wanted to propose a small feature for a future version of ANTLR.

Consider the following ANTLR-based grammar specification for Dart:

https://github.com/dart-lang/sdk/blob/master/tools/spec_parser/Dart.g

After staring at it for a while, one pattern will emerge as being used a lot which is interspersion i.e. a non-empty list where each element is separated by a different element.

L (I L)*

e.g.:

https://github.com/dart-lang/sdk/blob/d52fe19dee6e7157595e13a003ade428101406d4/tools/spec_parser/Dart.g#L281

https://github.com/dart-lang/sdk/blob/d52fe19dee6e7157595e13a003ade428101406d4/tools/spec_parser/Dart.g#L321-L323

https://github.com/dart-lang/sdk/blob/d52fe19dee6e7157595e13a003ade428101406d4/tools/spec_parser/Dart.g#L469

https://github.com/dart-lang/sdk/blob/d52fe19dee6e7157595e13a003ade428101406d4/tools/spec_parser/Dart.g#L523

https://github.com/dart-lang/sdk/blob/d52fe19dee6e7157595e13a003ade428101406d4/tools/spec_parser/Dart.g#L674

https://github.com/dart-lang/sdk/blob/d52fe19dee6e7157595e13a003ade428101406d4/tools/spec_parser/Dart.g#L682

and there are many more.

I think that it would be great if there was some syntax sugar to specify interspersion explicitly.

kaby76 commented 1 year ago

What is your proposal?

I have seen a short hand for e (t e)* in the Pegen Python PEG grammar and generator.

# s.e+
#   Match one or more occurrences of e, separated by s. The generated parse tree
#   does not include the separator. This is otherwise identical to (e (s e)*).

Dot in Antlr matches any terminal. In my opinion, this particular notation doesn't add much value.

modulovalue commented 1 year ago

What is your proposal?

I think somebody that is more deeply familiar with the ANTLR grammar would be better qualified to make a proposal for concrete syntax so I did not attempt to do that.

Your example syntax looks great to me. Except that I would perhaps prefer the separator to come in front of the plus and not the value e.g:

# e.s+
#   Match one or more occurrences of e, separated by s. The generated parse tree
#   does not include the separator. This is otherwise identical to (e (s e)*).
modulovalue commented 10 months ago

I'm going to close this issue as antlr4 appears to be in maintenance mode.