antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
16.98k stars 3.26k forks source link

"EMPTY" is a reserved word in CSharp target #3199

Open kaby76 opened 3 years ago

kaby76 commented 3 years ago

As the title says, EMPTY is a reserved symbol in grammars for the C# target (and likely in the alt tool/runtime Antlr4cs). That's because it's defined in ParserRuleContext here. If you define EMPTY as a lexer symbol in your grammar, the compiler will give warnings. For further information, see this issue. The workaround is to just rename EMPTY to EMPTY_ in your grammar. (I use trrename to bulk rename such conflicts quickly.)

KvanTTT commented 3 years ago

Could be added to badwords list for C#.

kaby76 commented 3 years ago

Ha! C# is the only language without "bad words". Goodness, what's a language without of few colorful metaphors. (All the other targets have the API Cpp, Dart, Go, JavaScript, Java, etc.)

KvanTTT commented 3 years ago

This is because C# allows keyword escaping. But EMPTY seems like is not the case and I guess such a set should be added to C#. It should contain EMPTY, children, _start, _stop, and other members that potentially conflict with grammar rule names. I'm curious, can this word set be generated using reflection?

C# "bad words" is described in CSharp.stg.

kaby76 commented 3 years ago

Goodness, why wasn't this mapping added for each of the other targets? This seems like an easy change and it would have prevented hundreds, if not thousands, of renames in grammars-v4 to remove symbol conflicts that I and others manual made over many months. (I implemented and used trrename a few months ago to change whole-scale the remaining grammars--half--in grammars-v4 to complete the renaming to eliminate the symbol conflicts.) Symbol rewrite in the generated code was suggested in 2015 but nothing came of it.

_stop, _start wouldn't have to be added because they're not legal Antlr nonterminal names. But, maybe "children", and whatever else is being used as a symbol name but that's a method or field name in the generated code. And, move the actual "badWords" list to this mapping.

KvanTTT commented 3 years ago

Goodness, why wasn't this mapping added for each of the other targets?

Because other targets don't support keyword escaping. Maybe Swift also supports (see docs).

Symbol rewrite in the generated code was suggested in 2015 but nothing came of it.

Yes, it's a good idea. Just add _ suffix to conflicting identifiers at the generation step.

kaby76 commented 3 years ago

Goodness, why wasn't this mapping added for each of the other targets?

Because other targets don't support keyword escaping.

Good grief. We didn't add the code to the .stg's (and rip out "badWords" elsewhere) because it's not already added.

This is really an easy change. Will give a try.

kaby76 commented 3 years ago

StringTemplate is delightful code! I just wish it was possible to pass the templates for code gen into the Antlr tool via command line, so I could write custom parsers, and new targets. As a prototype, I emptied out the "badWords" list for the Java target, then refactored it and added it to Java.stg. I also needed to change a few places where names need to be mapped to altered form. I tried it out on a grammar with "null" and it works beautifully. Attached is the Java.stg code. Java.txt. It should be easy to change the other templates and take care of symbol conflicts once and for all.

KvanTTT commented 3 years ago

It would be great!

Attached is the Java.stg code.

I recommend using GitHub Gist or another code-sharing service because they provide code highlighting, editing, and other useful features. Also, they are more convenient for viewers.

KvanTTT commented 2 years ago

@kaby76 I'm not sure that placing of such keywords into .stg code is a very good idea:

  1. It's redundant because all dictionary values corresponds to their keys plus suffix (@ for C#) or prefix (_ for others).
  2. In theory it's possible that replaced value also conflicts with runtime keywords. Especially it's actual for C languages where _ using is widespread.

Instead I suggest replacing on ANTLR side. We can have prefix or suffix fields for every runtime. For instance, C# will have empty suffix but @ prefix. Swift will have both suffix and prefix (``). Other runtimes will have_` suffix.

I'm going to fix it in my ANTLR fork together with other tasks that I consider important especially for grammars (first of all, introducing of case insensitive option).

kaby76 commented 2 years ago

I agree, just rename the generated symbols so there won't be a conflict ever. Ideally I'd like to pass the templates to the tool to provide even more flexibility, an important design consideration with the cross-target testing in grammars-v4.

On Sat, Nov 20, 2021, 11:12 AM Ivan Kochurkin @.***> wrote:

@kaby76 https://github.com/kaby76 I'm not sure that placing of such keywords into .stg code is a very good idea:

  1. It's redundant because all dictionary values corresponds to their keys plus suffix (@ for C#) or prefix (_ for others).
  2. In theory it's possible that replaced value also conflicts with runtime keywords. Especially it's actual for C languages where _ using is widespread.

Instead I suggest replacing on ANTLR side. We can have prefix or suffix fields for every runtime. For instance, C# will have empty suffix but @ prefix. Swift will have both suffix and prefix (). Other runtimes will have _` suffix.

I'm going to fix it in my ANTLR fork together with other tasks that I consider important especially for grammars (first of all, introducing of case insensitive option).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/3199#issuecomment-974672715, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACW4UC5TNAWUPAYOCHWZAZDUM7CIRANCNFSM46HU6DOQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

NickStrupat commented 2 years ago

Name collisions could be resolved by placing the new modifier on the generated lexer/parser class members which collide with the library member.

The EMPTY token member can be retrieved via <LexerName>Lexer.EMPTY and the original ParserRuleContext.EMPTY member can be retrieved by static cast of your instance to ParserRuleContext.

KvanTTT commented 2 years ago

We are working on the problem of conflicting identifiers.

NickStrupat commented 2 years ago

@KvanTTT you're awesome! Wondering if this issue I found might be related or of interest:

https://github.com/antlr/antlr4/issues/3503