antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.07k stars 3.27k forks source link

Unified Actions Language #1045

Closed KvanTTT closed 7 years ago

KvanTTT commented 8 years ago

I like ANTLR very much but one unpleasent thing still exists in this tool on my opinion.

I propose the following simplified syntax constructions for language, which will be translated to target language (Java, C#, Python, JavaScript) during lexer/parser generation step, i.e. Unified Actions Language (UAL):

UAL code could be embraced with this construction: {{ UAL code }} (or another brackets). Context-dependent predicates also will be available via $ syntax. Usual language-dependent actions will be available via usual {} syntax (but not recomended).

Advantages:

This ecmascript ugly grammar with big quantity of duplicate code will be transformed to the new one.

For example: I use the following sematic predicate:

stat
    : {java5}? 'goto' ID ';'

But I did not declare java5 variable in @members {boolean java5 = true;} section. So in this case ANTLR will throw the error "java5 member is not declarated".

I mean the following codes will be translated to the same actions with correct formatting (without useless spaces and with general code style).

@members
{public bool AspTags = true;
...
}
@members
{

public bool AspTags = true;
...
}

Disadvantages:

So, declarative approach (tokens and rules) and imperative approach (actions) will be linked together with UAL.

It it will be really cool to bring to ANTLR Unified Actions Language!

ericvergnaud commented 8 years ago

Hi, one way to overcome those limitations is to derive from an abstract parser. this provides runtime independence and opens a much wider range of possibilities than a dedicated language. Eric

KvanTTT commented 8 years ago

@ericvergnaud this approach will cause code duplication. Also errors will not be handled by ANTLR during parser generation step.

Parser should be used for code parsing, i.e. AST building. ANTLR is able to generate parser under different runtimes if grammar is context-free. But it can not generate parser under different runtimes if some syntax is context-dependent (Heredoc in PHP for example). But such syntax constructions parsing require small set of basic operations (string comparsion, variable storing). At the worst case usual actions can be used.

More complex actions not related to parsing process should be placed into Visitor or Listener classes.

ericvergnaud commented 8 years ago

@KvanTTT sorry but I use this approach in a multi target parser and it does not at all suffer from the diseases you describe. Maybe you misunderstood. What I do is I create an abstract MyParser class which I declare in my grammar using the superClass option. This class has all the necessary utilities required for predicates, which is the only piece that needs to be in the grammar. There is no code duplication (unless you consider that C#/Python/JavaScript code duplicates Java code). And error handling is exactly the same as if the code was inline. See https://github.com/prompto/prompto-grammars for an example (the target specific abstract parsers are under their respective repos)

KvanTTT commented 8 years ago

There is no code duplication (unless you consider that C#/Python/JavaScript code duplicates Java code).

@ericvergnaud, I consider this one. Errors in actions in your case can be handled only on compilation level, not on parser generation leve (except several actions with context-dependent predicates starts with $ symbol, i.e. $ID.text, $ID.line, etc.). Moreover, grammars with imports and different code for different runtimes quite hard for perception and not descriptive.

Consider the following grammars: Python3.g4, ECMAScript.PythonTarget.g4, PHPLexer.g4 and other grammars with inlined actions. I developed PHP grammar for C# runtime. But if somebody else want to use this grammar with Java runtime, some part of grammar with actions should be completely rewritten.

ericvergnaud commented 8 years ago

I believe you misunderstood my grammars, there are 3 dialects E, O and S, which support javascript, C#, Java and Python grammar fragments. The javascript, python etc... grammars contain those fragments and are not target specific actions.

Most if not all of the target specific code in the grammars you reference could be moved to an abstract parent parser class, declared with the superClass option. And surely you don't expect antlr to provide a meta language able to cover all those use cases. This will never happen.

parrt commented 7 years ago

We've discussed this a few times over the years. Not a bad thing but I'm going to close. thanks!

kasbah commented 6 years ago

Isn't "Cross language actions embedded within grammars" something like this?

parrt commented 6 years ago

@kasbah yeah that works for a small set of know things like printing.

gh-markt commented 4 years ago

For what it's worth, you can't even use superclasses to try and remove target dependancy from the grammar without ironically introducing further target dependency, as the means by which the superclass name should be imported into the parser in the @header section can still be language dependant.

Also, any usage of arguments, locals, or return values with parser rules introduces target dependency into the grammer file, as how some variables are declared in the first place varies across targets. booleans and strings are boolean and String in Java, respectively, while they are bool and std::string in C++, for example.

Even if no custom actions at are used anywhere in the grammar, and all target-specific code is always delegated to listeners or visitors, there is no apparent way to develop a target-language independent grammar for anything but very simple kinds of parsers. Separate grammars must always be developed for each intended target, at least for the portions of the grammars that have any such declarations, which in some cases may be substantial.

Even if only one target is intended as the final target for a project that uses ANTLR, if the final desired target happens to be something other than Java, a separate Java-compatible grammar must still be produced (with all the declarations that would otherwise have made it express the various states that the parser might require during visits or listening removed) if wants to use the development tools that interoperate with ANTLR, which makes regression testing to verify that the grammar actually works much harder to verify that it has been done correctly, as the intended target grammar and the java one might diverge.

ericvergnaud commented 4 years ago

Sorry to somewhat disagree, but I use superclasses to address that need with the same grammar in Java, C#, Python and JavaScript. I restrict myself to predicates, which is the only thing the parser needs to know. Agree that I would not be able to use the same syntax in C++ and other languages, but luckily I don't have that requirement.

gh-markt commented 4 years ago

Which is, as I said, target dependency. To import a desired superclass into the parser, you need to use something like import <x> in Java, while in C++ you may need to use at least one line of the form #include "file.h". I can imagine that you might be able to get away with using the same syntax as Java for importing a superclass name in both C# and Python in this regard, although I thought that the syntax for importing a class from Javascript was something like import <x> from "filename.js", which is quite a bit different.

As you say, it is lucky you don't have the requirement of needing to use C++, but if C++ does happen to be the intended target, unless the grammar is trivial it is impossible to use the same grammar file both for a C++ target and as the file to still be used directly by the development tools written for ANTLR for regression testing. I mean no disrespect, but I hope you can appreciate that saying "it works for me" doesn't actually address the problem.

ericvergnaud commented 4 years ago

Well, no disrespect either, but "you can't even use superclasses to try and remove target dependency" was misleading since you can for most supported targets. If the C++ target does not include superclass.h in parser.h then maybe that’s actually a bug?

Le 3 juin 2020 à 23:08, gh-markt notifications@github.com a écrit :

Which is, as I said, target dependency. To import a desired superclass into the parser, you need to use import in Java, while in C++ you may need to use at least one line of the form #include "file.h". I can imagine that you might be able to get away with using the same syntax as Java for importing a superclass name in both C# and Python in this regard, although I thought that the syntax for importing a class from Javascript was import from "filename.js", which is quite a bit different.

As you say, it is lucky you don't have the requirement of needing to use C++, but if C++ does happen to be the intended target, unless the grammar is trivial it is impossible to use the same grammar file both for a C++ target and as the file to still be used directly by the development tools written for ANTLR for regression testing. I mean no disrespect, but I hope you can appreciate that saying "it works for me" doesn't actually address the problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/1045#issuecomment-638259231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJBM4QZRA2OOWK5C7UTRUZRNDANCNFSM4BUV5LHQ.

gh-markt commented 4 years ago

The Java target does not import superclass from the necessary package either. Unless you impose the requirement that the files produced by antlr always be in the same package as the superclass in which case the import statement isn't required. This isn't always practical, however. As for C++, do you then impose the requirement that the superclass must be in a file called "superclass.h"? Possible, but again, not always practical. Should the filename be all lower case although the class name begins with capital letter or uses camelcase, for example?

And the problem remains for local variables declarred as locals [ ... ] in a rule. There is just no way to do this portably between languages that use different names or syntaxes for making declarations. It's true that it works for "most" supported targets, but at the end of the day, as long as targets exist that it doesn't work for, it's still not target language independent.

I feel like the tool could benefit from an option which could be specified as a grammar option or on the command line to simply exclude any declarations which mention types, so that even if a grammar was written for a particular target which deviates enough from Java to otherwise have problems, the base ANTLR development tools could still be used to analyze it

ericvergnaud commented 4 years ago

we do impose that the files produced by antlr reside in the same java package, the same C# namespace and so forth… and if not the case I would definitely impose that in C++ the file should include the if a superclass is defined (and the name should be that of the superclass)

not sure what ‘practical’ means here, how unpractical can it be to have all generated files in the same folder? It may not fit everybody’s habits, but it certainly creates a stable context for troubleshooting…

as per the variables, you are right. There’s no way. And providing one would probably break with the next target... Tbh I’ve never used variables myself. The only code I tolerate in my grammars is semantic predicates.

Le 4 juin 2020 à 00:13, gh-markt notifications@github.com a écrit :

The Java target does not import superclass from the necessary package either. Unless you impose the requirement that the files produced by antlr always be in the same package as the superclass in which case the import statement isn't required. This isn't always practical, however. As for C++, do you then impose the requirement that the superclass must be in a file called "superclass.h"? Possible, but again, not always practical.

And the problem remains for local variables declarred as locals [ ... ] in a rule. There is just no way to do this portably between languages that use different names or syntaxes for making declarations.

I feel like the tool could benefit from an option which could be specified as a grammar option or on the command line to simply exclude any declarations which mention types, so that even if a grammar was written for a particular target which deviates enough from Java to otherwise have problems, the base ANTLR development tools could still be used to analyze it (although the C++ target would still need to include the superclass header file as I mentioned above).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/1045#issuecomment-638300337, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJEQKQ6ROTZQ7N74HYLRUZZEDANCNFSM4BUV5LHQ.

gh-markt commented 4 years ago

I think that the best compromise in that regard would be to have an option to antlr to exclude all of the variable declarations from its output so that at least the base antlr tools can analyze the grammar as they ordinarily would, and if it had no target specific content. This doesn't make the grammar file target-independent, but at least has the benefit that having such dependencies in the file does not preclude the base antlr development tools from being able to work with and analyze it when that option is specified.

KvanTTT commented 2 years ago

Just a thought: Haxe can be used as such a language. It's the universal language that has a lot of targets.

ericvergnaud commented 2 years ago

mmm that would require embedding the Haxe VM in every target... kind of a show stopper what we need is a language that translates to any target language, which imho is out of reach in the very specific context of ANTLR generated lexers and parsers

KvanTTT commented 2 years ago

Not necessary Haxe VM. Its compiler can convert code to different languages (C#, Java, JavaScript, Python, C++, etc). The full list is here: https://haxe.org/documentation/introduction/compiler-targets.html

what we need is a language that translates to any target language, which imho is out of reach in the very specific context of ANTLR generated lexers and parsers

Yes, we need it. The specific part is the parser API, but it's not a big problem (like included by default string API in any language). Also, this language should be turning-complete in perspective if we want to get rid of superclasses at all.

At least we can use part of Haxe compiler or just Haxe grammar but convert code fragments manually with the correct mapping between generated and input code. Or just develop self-made language (but in my opinion, it's better to use at least grammar of existing language).

ericvergnaud commented 2 years ago

Sorry but not sure I agree with your findings. Haxe is able to compile code to different targets, but what is needed is source code fragments to be embedded in the lexer or parser code, which happens long before compilation. And not sure why you would want to get rid of superClasses ?

parrt commented 2 years ago

Hi. Yep. We considered an imperative actions language years ago; I even had a name; NIL= neutral imperative language. In the end, we abandoned the effort.

KvanTTT commented 2 years ago

Haxe is able to compile code to different targets, but what is needed is source code fragments to be embedded in the lexer or parser code, which happens long before compilation.

Yes, I know. That's why I suggest using the only grammar of such "universal" language, but not complete infrastructure. Maybe you are right and it does not make big sense to use such language but use DSL.

And not sure why you would want to get rid of superClasses?

Because it's logic duplication if you want to use several targets. It entails more errors and more effort during grammar development. But as the first step, it's not necessary to get rid of super classes, just use limited syntax for universal actions and predicates.

Hi. Yep. We considered an imperative actions language years ago; I even had a name; NIL= neutral imperative language. In the end, we abandoned the effort.

Good idea. Do you have any developments? Sometime later I'm going to experiment with such language.

parrt commented 2 years ago

@KvanTTT can't remember all the reasons but it just seemed to not be that useful in practice except for simplest predicates / symbol table stuff.

gh-markt commented 2 years ago

@parrt But how are those situations themselves not important or useful?

parrt commented 2 years ago

for real languages it turns out it wasn't enough to do anything but trivial problems. The minute you need conditionals and things like that you start to expose a need for a full language. You might as well pick a language, restricted a bit, and then translate that to various targets if you want.

It's easy enough to do yourself. Just leave {@marker} type in the code and do a preprocessor pass over the grammar to fill in the various things you need for a variety of languages.

KvanTTT commented 2 years ago

It's easy enough to do yourself. Just leave {@marker} type in the code and do a preprocessor pass over the grammar to fill in the various things you need for a variety of languages.

BTW, there is another missing feature in ANTLR (most likely in StringTemplate). Unfortunately, it's not possible to extract text span for original action/predicate from generated code because ANTLR does not provide the mapping between generated code and input for actions/predicate. It's possible to use marker comments for action/predicates: {/*marker1_start*/marker/*marker1_stop*/}, find them in generated code, and associate with input markers. But it looks ugly and quite complicated.

Such a feature improves the grammar development experience because all errors (including errors in actions and predicates) will be shown in input grammar, not in generated code.

parrt commented 2 years ago

Long ago I think I did this for C using the preprocessor #line command. :) A cool idea. Would it just be generating comments into generated code at start/end of action code?

KvanTTT commented 2 years ago

Long ago I think I did this for C using the preprocessor #line command.

I think it's not a very good idea, because #line directive only provides line numbers, but not columns. Also, it won't work with targets other than C.

Would it just be generating comments into generated code at start/end of action code?

Yes, moreover, after mapping is built, marker comments can be removed both from generated code and from the grammar. I've implemented such an idea in my prototype project for grammar development (test is also available). The implementation is quite complicated and not very efficient. That's why it's better to provide such mapping directly via StringTemplate.

parrt commented 2 years ago

Interesting. Anyway, yeah, unlikely that I can manage any new features like this.

udif commented 2 years ago

What if we live with generic code inside semantic predicates, but customize their implementation? If we just use { identifier }? or { func() }? it probably solves the issue for 99% of the languages, and you can write only the language specific section separately.

stat
    : { java5() }? 'goto' ID ';'

Then have:

@parser::members::java {
  boolean java5() {
    ...
  }
}
@parser::members::cpp {
  bool java5() {
    ...
  }
}

Or a similar syntax ? (applied to other @parser:: and @lexer:: tags as well).

The existing generic @parser::members will keep working as-is, keeping backward compatibility.

KvanTTT commented 2 years ago

@udif It looks like a good idea! Not ideal solution, but quite workable.

@parrt we can use such new syntax to consolidate our runtime tests, see https://github.com/antlr/antlr4/pull/3775

parrt commented 2 years ago

I've considered such an imperative language multiple times and rejected it each time after I thought about it. sorry.

KvanTTT commented 2 years ago

@udif doesn't suggest universal language, but a way to handle all specific runtime code within one grammar file.It's especially useful for new tests that will be able to handle specific code that is not being tested yet.

parrt commented 2 years ago

Yep. That's what I considered and rejected multiple times. Are you going to have operators? Soon you have yet another language to maintain.

udif commented 2 years ago

Yep. That's what I considered and rejected multiple times. Are you going to have operators? Soon you have yet another language to maintain.

No operators, just a single function call, that should be mappable to Java, C++, Python or any reasonable language I can think of. Everything else is handled in language-specific sections.

KvanTTT commented 2 years ago

No operators, just a single function call, that should be mappable to Java, C++, Python or any reasonable language I can think of. Everything else is handled in language-specific sections.

Yes. May be some minor fixes are required (to handle $self, $lexer, or $parser in all targets, https://github.com/antlr/antlr4/blob/master/doc/actions.md#parser-rule-attributes), but they should be fixed anyway since such functionality is already presented in ANTLR. Function call syntax is the same for all runtimes that we support.

parrt commented 2 years ago

Function calls won't be enough. You don't have access to context of invoking rule. Plus I'm not adding more complexity to the tool. Feel free to make a subclass and override some functions. Should work for most languages.

ericvergnaud commented 2 years ago

Indeed subclassing is the way to go

Le 5 sept. 2022 à 19:53, Terence Parr @.***> a écrit :

Function calls won't be enough. You don't have access to context of invoking rule. Plus I'm not adding more complexity to the tool. Feel free to make a subclass and override some functions. Should work for most languages.

— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/1045#issuecomment-1237346093, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJBHH4ZN64ZVWYO67WDV4YXRVANCNFSM4BUV5LHQ. You are receiving this because you were mentioned.

KvanTTT commented 1 year ago

@parrt could you please move this issue to the GitHub discussions?