Closed KvanTTT closed 7 years ago
Hi, one way to overcome those limitations is to derive from an abstract parser. this provides runtime independence and opens a much wider range of possibilities than a dedicated language. Eric
@ericvergnaud this approach will cause code duplication. Also errors will not be handled by ANTLR during parser generation step.
Parser should be used for code parsing, i.e. AST building. ANTLR is able to generate parser under different runtimes if grammar is context-free. But it can not generate parser under different runtimes if some syntax is context-dependent (Heredoc in PHP for example). But such syntax constructions parsing require small set of basic operations (string comparsion, variable storing). At the worst case usual actions can be used.
More complex actions not related to parsing process should be placed into Visitor or Listener classes.
@KvanTTT sorry but I use this approach in a multi target parser and it does not at all suffer from the diseases you describe. Maybe you misunderstood. What I do is I create an abstract MyParser class which I declare in my grammar using the superClass option. This class has all the necessary utilities required for predicates, which is the only piece that needs to be in the grammar. There is no code duplication (unless you consider that C#/Python/JavaScript code duplicates Java code). And error handling is exactly the same as if the code was inline. See https://github.com/prompto/prompto-grammars for an example (the target specific abstract parsers are under their respective repos)
There is no code duplication (unless you consider that C#/Python/JavaScript code duplicates Java code).
@ericvergnaud, I consider this one. Errors in actions in your case can be handled only on compilation level, not on parser generation leve (except several actions with context-dependent predicates starts with $
symbol, i.e. $ID.text, $ID.line, etc.). Moreover, grammars with imports and different code for different runtimes quite hard for perception and not descriptive.
Consider the following grammars: Python3.g4, ECMAScript.PythonTarget.g4, PHPLexer.g4 and other grammars with inlined actions. I developed PHP grammar for C# runtime. But if somebody else want to use this grammar with Java runtime, some part of grammar with actions should be completely rewritten.
I believe you misunderstood my grammars, there are 3 dialects E, O and S, which support javascript, C#, Java and Python grammar fragments. The javascript, python etc... grammars contain those fragments and are not target specific actions.
Most if not all of the target specific code in the grammars you reference could be moved to an abstract parent parser class, declared with the superClass option. And surely you don't expect antlr to provide a meta language able to cover all those use cases. This will never happen.
We've discussed this a few times over the years. Not a bad thing but I'm going to close. thanks!
Isn't "Cross language actions embedded within grammars" something like this?
@kasbah yeah that works for a small set of know things like printing.
For what it's worth, you can't even use superclasses to try and remove target dependancy from the grammar without ironically introducing further target dependency, as the means by which the superclass name should be imported into the parser in the @header
section can still be language dependant.
Also, any usage of arguments, locals, or return values with parser rules introduces target dependency into the grammer file, as how some variables are declared in the first place varies across targets. booleans and strings are boolean
and String
in Java, respectively, while they are bool
and std::string
in C++, for example.
Even if no custom actions at are used anywhere in the grammar, and all target-specific code is always delegated to listeners or visitors, there is no apparent way to develop a target-language independent grammar for anything but very simple kinds of parsers. Separate grammars must always be developed for each intended target, at least for the portions of the grammars that have any such declarations, which in some cases may be substantial.
Even if only one target is intended as the final target for a project that uses ANTLR, if the final desired target happens to be something other than Java, a separate Java-compatible grammar must still be produced (with all the declarations that would otherwise have made it express the various states that the parser might require during visits or listening removed) if wants to use the development tools that interoperate with ANTLR, which makes regression testing to verify that the grammar actually works much harder to verify that it has been done correctly, as the intended target grammar and the java one might diverge.
Sorry to somewhat disagree, but I use superclasses to address that need with the same grammar in Java, C#, Python and JavaScript. I restrict myself to predicates, which is the only thing the parser needs to know. Agree that I would not be able to use the same syntax in C++ and other languages, but luckily I don't have that requirement.
Which is, as I said, target dependency. To import a desired superclass into the parser, you need to use something like import <x>
in Java, while in C++ you may need to use at least one line of the form #include "file.h"
. I can imagine that you might be able to get away with using the same syntax as Java for importing a superclass name in both C# and Python in this regard, although I thought that the syntax for importing a class from Javascript was something like import <x> from "filename.js"
, which is quite a bit different.
As you say, it is lucky you don't have the requirement of needing to use C++, but if C++ does happen to be the intended target, unless the grammar is trivial it is impossible to use the same grammar file both for a C++ target and as the file to still be used directly by the development tools written for ANTLR for regression testing. I mean no disrespect, but I hope you can appreciate that saying "it works for me" doesn't actually address the problem.
Well, no disrespect either, but "you can't even use superclasses to try and remove target dependency" was misleading since you can for most supported targets. If the C++ target does not include superclass.h in parser.h then maybe that’s actually a bug?
Le 3 juin 2020 à 23:08, gh-markt notifications@github.com a écrit :
Which is, as I said, target dependency. To import a desired superclass into the parser, you need to use import in Java, while in C++ you may need to use at least one line of the form #include "file.h". I can imagine that you might be able to get away with using the same syntax as Java for importing a superclass name in both C# and Python in this regard, although I thought that the syntax for importing a class from Javascript was import from "filename.js", which is quite a bit different.
As you say, it is lucky you don't have the requirement of needing to use C++, but if C++ does happen to be the intended target, unless the grammar is trivial it is impossible to use the same grammar file both for a C++ target and as the file to still be used directly by the development tools written for ANTLR for regression testing. I mean no disrespect, but I hope you can appreciate that saying "it works for me" doesn't actually address the problem.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/1045#issuecomment-638259231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJBM4QZRA2OOWK5C7UTRUZRNDANCNFSM4BUV5LHQ.
The Java target does not import superclass from the necessary package either. Unless you impose the requirement that the files produced by antlr always be in the same package as the superclass in which case the import statement isn't required. This isn't always practical, however. As for C++, do you then impose the requirement that the superclass must be in a file called "superclass.h"? Possible, but again, not always practical. Should the filename be all lower case although the class name begins with capital letter or uses camelcase, for example?
And the problem remains for local variables declarred as locals [ ... ]
in a rule. There is just no way to do this portably between languages that use different names or syntaxes for making declarations. It's true that it works for "most" supported targets, but at the end of the day, as long as targets exist that it doesn't work for, it's still not target language independent.
I feel like the tool could benefit from an option which could be specified as a grammar option or on the command line to simply exclude any declarations which mention types, so that even if a grammar was written for a particular target which deviates enough from Java to otherwise have problems, the base ANTLR development tools could still be used to analyze it
we do impose that the files produced by antlr reside in the same java package, the same C# namespace and so forth…
and if not the case I would definitely impose that in C++ the
not sure what ‘practical’ means here, how unpractical can it be to have all generated files in the same folder? It may not fit everybody’s habits, but it certainly creates a stable context for troubleshooting…
as per the variables, you are right. There’s no way. And providing one would probably break with the next target... Tbh I’ve never used variables myself. The only code I tolerate in my grammars is semantic predicates.
Le 4 juin 2020 à 00:13, gh-markt notifications@github.com a écrit :
The Java target does not import superclass from the necessary package either. Unless you impose the requirement that the files produced by antlr always be in the same package as the superclass in which case the import statement isn't required. This isn't always practical, however. As for C++, do you then impose the requirement that the superclass must be in a file called "superclass.h"? Possible, but again, not always practical.
And the problem remains for local variables declarred as locals [ ... ] in a rule. There is just no way to do this portably between languages that use different names or syntaxes for making declarations.
I feel like the tool could benefit from an option which could be specified as a grammar option or on the command line to simply exclude any declarations which mention types, so that even if a grammar was written for a particular target which deviates enough from Java to otherwise have problems, the base ANTLR development tools could still be used to analyze it (although the C++ target would still need to include the superclass header file as I mentioned above).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/1045#issuecomment-638300337, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJEQKQ6ROTZQ7N74HYLRUZZEDANCNFSM4BUV5LHQ.
I think that the best compromise in that regard would be to have an option to antlr to exclude all of the variable declarations from its output so that at least the base antlr tools can analyze the grammar as they ordinarily would, and if it had no target specific content. This doesn't make the grammar file target-independent, but at least has the benefit that having such dependencies in the file does not preclude the base antlr development tools from being able to work with and analyze it when that option is specified.
Just a thought: Haxe can be used as such a language. It's the universal language that has a lot of targets.
mmm that would require embedding the Haxe VM in every target... kind of a show stopper what we need is a language that translates to any target language, which imho is out of reach in the very specific context of ANTLR generated lexers and parsers
Not necessary Haxe VM. Its compiler can convert code to different languages (C#, Java, JavaScript, Python, C++, etc). The full list is here: https://haxe.org/documentation/introduction/compiler-targets.html
what we need is a language that translates to any target language, which imho is out of reach in the very specific context of ANTLR generated lexers and parsers
Yes, we need it. The specific part is the parser API, but it's not a big problem (like included by default string API in any language). Also, this language should be turning-complete in perspective if we want to get rid of superclasses at all.
At least we can use part of Haxe compiler or just Haxe grammar but convert code fragments manually with the correct mapping between generated and input code. Or just develop self-made language (but in my opinion, it's better to use at least grammar of existing language).
Sorry but not sure I agree with your findings. Haxe is able to compile code to different targets, but what is needed is source code fragments to be embedded in the lexer or parser code, which happens long before compilation. And not sure why you would want to get rid of superClasses ?
Hi. Yep. We considered an imperative actions language years ago; I even had a name; NIL= neutral imperative language. In the end, we abandoned the effort.
Haxe is able to compile code to different targets, but what is needed is source code fragments to be embedded in the lexer or parser code, which happens long before compilation.
Yes, I know. That's why I suggest using the only grammar of such "universal" language, but not complete infrastructure. Maybe you are right and it does not make big sense to use such language but use DSL.
And not sure why you would want to get rid of superClasses?
Because it's logic duplication if you want to use several targets. It entails more errors and more effort during grammar development. But as the first step, it's not necessary to get rid of super classes, just use limited syntax for universal actions and predicates.
Hi. Yep. We considered an imperative actions language years ago; I even had a name; NIL= neutral imperative language. In the end, we abandoned the effort.
Good idea. Do you have any developments? Sometime later I'm going to experiment with such language.
@KvanTTT can't remember all the reasons but it just seemed to not be that useful in practice except for simplest predicates / symbol table stuff.
@parrt But how are those situations themselves not important or useful?
for real languages it turns out it wasn't enough to do anything but trivial problems. The minute you need conditionals and things like that you start to expose a need for a full language. You might as well pick a language, restricted a bit, and then translate that to various targets if you want.
It's easy enough to do yourself. Just leave {@marker}
type in the code and do a preprocessor pass over the grammar to fill in the various things you need for a variety of languages.
It's easy enough to do yourself. Just leave {@marker} type in the code and do a preprocessor pass over the grammar to fill in the various things you need for a variety of languages.
BTW, there is another missing feature in ANTLR (most likely in StringTemplate). Unfortunately, it's not possible to extract text span for original action/predicate from generated code because ANTLR does not provide the mapping between generated code and input for actions/predicate. It's possible to use marker comments for action/predicates: {/*marker1_start*/marker/*marker1_stop*/}
, find them in generated code, and associate with input markers. But it looks ugly and quite complicated.
Such a feature improves the grammar development experience because all errors (including errors in actions and predicates) will be shown in input grammar, not in generated code.
Long ago I think I did this for C using the preprocessor #line
command. :) A cool idea. Would it just be generating comments into generated code at start/end of action code?
Long ago I think I did this for C using the preprocessor #line command.
I think it's not a very good idea, because #line
directive only provides line numbers, but not columns. Also, it won't work with targets other than C.
Would it just be generating comments into generated code at start/end of action code?
Yes, moreover, after mapping is built, marker comments can be removed both from generated code and from the grammar. I've implemented such an idea in my prototype project for grammar development (test is also available). The implementation is quite complicated and not very efficient. That's why it's better to provide such mapping directly via StringTemplate.
Interesting. Anyway, yeah, unlikely that I can manage any new features like this.
What if we live with generic code inside semantic predicates, but customize their implementation?
If we just use { identifier }?
or { func() }?
it probably solves the issue for 99% of the languages, and you can write only the language specific section separately.
stat
: { java5() }? 'goto' ID ';'
Then have:
@parser::members::java {
boolean java5() {
...
}
}
@parser::members::cpp {
bool java5() {
...
}
}
Or a similar syntax ? (applied to other @parser::
and @lexer::
tags as well).
The existing generic @parser::members
will keep working as-is, keeping backward compatibility.
@udif It looks like a good idea! Not ideal solution, but quite workable.
@parrt we can use such new syntax to consolidate our runtime tests, see https://github.com/antlr/antlr4/pull/3775
I've considered such an imperative language multiple times and rejected it each time after I thought about it. sorry.
@udif doesn't suggest universal language, but a way to handle all specific runtime code within one grammar file.It's especially useful for new tests that will be able to handle specific code that is not being tested yet.
Yep. That's what I considered and rejected multiple times. Are you going to have operators? Soon you have yet another language to maintain.
Yep. That's what I considered and rejected multiple times. Are you going to have operators? Soon you have yet another language to maintain.
No operators, just a single function call, that should be mappable to Java, C++, Python or any reasonable language I can think of. Everything else is handled in language-specific sections.
No operators, just a single function call, that should be mappable to Java, C++, Python or any reasonable language I can think of. Everything else is handled in language-specific sections.
Yes. May be some minor fixes are required (to handle $self
, $lexer
, or $parser
in all targets, https://github.com/antlr/antlr4/blob/master/doc/actions.md#parser-rule-attributes), but they should be fixed anyway since such functionality is already presented in ANTLR. Function call syntax is the same for all runtimes that we support.
Function calls won't be enough. You don't have access to context of invoking rule. Plus I'm not adding more complexity to the tool. Feel free to make a subclass and override some functions. Should work for most languages.
Indeed subclassing is the way to go
Le 5 sept. 2022 à 19:53, Terence Parr @.***> a écrit :
Function calls won't be enough. You don't have access to context of invoking rule. Plus I'm not adding more complexity to the tool. Feel free to make a subclass and override some functions. Should work for most languages.
— Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/1045#issuecomment-1237346093, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJBHH4ZN64ZVWYO67WDV4YXRVANCNFSM4BUV5LHQ. You are receiving this because you were mentioned.
@parrt could you please move this issue to the GitHub discussions?
I like ANTLR very much but one unpleasent thing still exists in this tool on my opinion.
I propose the following simplified syntax constructions for language, which will be translated to target language (Java, C#, Python, JavaScript) during lexer/parser generation step, i.e. Unified Actions Language (UAL):
la(-1)
equivalent to_input.LA(-1)
for Java._input.La(-1)
for C#.self._input.LA(1)
for Python.printLn("str")
equvivalent toSystem.out.println("str")
for JavaSystem.Console.WriteLine("str")
for C#console.log("str")
for JavaScriptprint 'str'
for Python.def nextToken()
public Token nextToken()
for Javapublic override IToken NextToken()
for C#def nextToken()
for Pythonbase.NextToken()
super.nextToken()
for Javabase.NextToken()
for C#super(self).nextToken()
for Pythona == "str"
a.equals("str")
for Java.a.Equals("str")
for C#.a == b
for Python.a === b
for JavaScript.UAL code could be embraced with this construction:
{{
UAL code}}
(or another brackets). Context-dependent predicates also will be available via$
syntax. Usual language-dependent actions will be available via usual{}
syntax (but not recomended).Advantages:
This ecmascript ugly grammar with big quantity of duplicate code will be transformed to the new one.
For example: I use the following sematic predicate:
But I did not declare
java5
variable in@members {boolean java5 = true;}
section. So in this case ANTLR will throw the error "java5 member is not declarated".I mean the following codes will be translated to the same actions with correct formatting (without useless spaces and with general code style).
Disadvantages:
So, declarative approach (tokens and rules) and imperative approach (actions) will be linked together with UAL.
It it will be really cool to bring to ANTLR Unified Actions Language!