Open binarycow opened 4 years ago
Hi,
the lexer is where tokens are given birth the parser accessors are just shortcuts to the underlying lexer token factory
Eric
Le 18 janv. 2020 à 23:01, Mike Christiansen notifications@github.com a écrit :
C# Runtime, Nuget package version 4.7.2, ANTLR version antlr-4.8-complete.jar
Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.
I have written a custom token, inheriting from CommonToken. I created a token factory, implementing ITokenFactory.
I can set the token factory of the lexer just fine, using the below code:
lexer.TokenFactory = tokenFactory;
But, I cannot set the parser's Token Factory property, since it is read only.
I would expect to be able to use this code:
parser.TokenFactory = tokenFactory;
Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.
Thanks in advance!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA.
Oh. Okay. So as long as the ID numbers match for each token it's not an issue?
Speaking of, is there an easy way of ensuring the token ids match in each grammar? If I add tokens to the lexer grammar, I have to make sure I add them to the parser grammar in the exact same order.
On Sat, Jan 18, 2020, 20:58 ericvergnaud notifications@github.com wrote:
Hi,
the lexer is where tokens are given birth the parser accessors are just shortcuts to the underlying lexer token factory
Eric
Le 18 janv. 2020 à 23:01, Mike Christiansen notifications@github.com a écrit :
C# Runtime, Nuget package version 4.7.2, ANTLR version antlr-4.8-complete.jar
Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.
I have written a custom token, inheriting from CommonToken. I created a token factory, implementing ITokenFactory.
I can set the token factory of the lexer just fine, using the below code:
lexer.TokenFactory = tokenFactory;
But, I cannot set the parser's Token Factory property, since it is read only.
I would expect to be able to use this code:
parser.TokenFactory = tokenFactory;
Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.
Thanks in advance!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA .
you cannot add tokens to a parser grammar so not sure how they would not match. (unless you are missing the ‘parser grammar’ declaration in your g4?)
Le 19 janv. 2020 à 10:15, Mike Christiansen notifications@github.com a écrit :
Oh. Okay. So as long as the ID numbers match for each token it's not an issue?
Speaking of, is there an easy way of ensuring the token ids match in each grammar? If I add tokens to the lexer grammar, I have to make sure I add them to the parser grammar in the exact same order.
On Sat, Jan 18, 2020, 20:58 ericvergnaud <notifications@github.com mailto:notifications@github.com> wrote:
Hi,
the lexer is where tokens are given birth the parser accessors are just shortcuts to the underlying lexer token factory
Eric
Le 18 janv. 2020 à 23:01, Mike Christiansen <notifications@github.com mailto:notifications@github.com> a écrit :
C# Runtime, Nuget package version 4.7.2, ANTLR version antlr-4.8-complete.jar
Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.
I have written a custom token, inheriting from CommonToken. I created a token factory, implementing ITokenFactory.
I can set the token factory of the lexer just fine, using the below code:
lexer.TokenFactory = tokenFactory;
But, I cannot set the parser's Token Factory property, since it is read only.
I would expect to be able to use this code:
parser.TokenFactory = tokenFactory;
Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.
Thanks in advance!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519 https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA> .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJB3COYR3VNGYQOTX5LQ6OZTRA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKG3YQ#issuecomment-575958498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJCR2VYOTAII4HU23XTQ6OZTRANCNFSM4KITOGUA.
Okay, here's my issue, regarding the token ordering...
I have a lexer grammar that defines eight tokens. They are defined in the below order, and my Lexer.cs file has them using these integer values:
I have a parser grammar, which uses the tokens in the below order, and my Parser.cs file has them using them defined with these integer values:
Additionally, I get warnings of implicit token creation when I execute antlr on the parser.g4 file.
When I run my test program, I check the token types, and they are all matched correctly. But the parser is not able to parse the input correctly. If I take the token type integers that the lexer reports, and compare it to the token type integers listed in the Parser.cs file, I can see that the parser, using the integer values, is parsing it "correctly" - from its perspective.
If I add this to the top of the parser grammar, it parses the output just fine. I also notice that the tokens are defined in Parser.cs with the correct integer numbers.
tokens { STATEMENT_START, OUTPUT, STATEMENT_END, KEYWORD_FOR, KEYWORD_ENDFOR, KEYWORD_IN, WHITESPACE, IDENTIFIER }
It's clear to me that:
Currently, I am manually keeping the two token lists in sync - I am taking the token names in the lexer.cs file, and putting them, in that order, in tokens { } section of the parser.g4 file. This fixes my issue, but its a pain.
Attached are sample files.
Thanks for any help you can provide.
On Sat, Jan 18, 2020, 21:22 ericvergnaud notifications@github.com wrote:
you cannot add tokens to a parser grammar so not sure how they would not match. (unless you are missing the ‘parser grammar’ declaration in your g4?)
Le 19 janv. 2020 à 10:15, Mike Christiansen notifications@github.com a écrit :
Oh. Okay. So as long as the ID numbers match for each token it's not an issue?
Speaking of, is there an easy way of ensuring the token ids match in each grammar? If I add tokens to the lexer grammar, I have to make sure I add them to the parser grammar in the exact same order.
On Sat, Jan 18, 2020, 20:58 ericvergnaud <notifications@github.com mailto:notifications@github.com> wrote:
Hi,
the lexer is where tokens are given birth the parser accessors are just shortcuts to the underlying lexer token factory
Eric
Le 18 janv. 2020 à 23:01, Mike Christiansen < notifications@github.com mailto:notifications@github.com> a écrit :
C# Runtime, Nuget package version 4.7.2, ANTLR version antlr-4.8-complete.jar
Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.
I have written a custom token, inheriting from CommonToken. I created a token factory, implementing ITokenFactory.
I can set the token factory of the lexer just fine, using the below code:
lexer.TokenFactory = tokenFactory;
But, I cannot set the parser's Token Factory property, since it is read only.
I would expect to be able to use this code:
parser.TokenFactory = tokenFactory;
Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.
Thanks in advance!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <
https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q , or unsubscribe <
https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA < https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519 < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA < https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJB3COYR3VNGYQOTX5LQ6OZTRA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKG3YQ#issuecomment-575958498>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAZNQJCR2VYOTAII4HU23XTQ6OZTRANCNFSM4KITOGUA .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVECCJHTAERNM5G4S33Q6O2PFA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKG7GI#issuecomment-575958937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI4GGVABVKZXIVP46XFYDSTQ6O2PFANCNFSM4KITOGUA .
Aha! I think I found the answer. options { tokenVocab = Lexer };
On Tue, Jan 21, 2020 at 11:44 AM Michael Christiansen < michael.a.christiansen10@gmail.com> wrote:
Okay, here's my issue, regarding the token ordering...
I have a lexer grammar that defines eight tokens. They are defined in the below order, and my Lexer.cs file has them using these integer values:
- STATEMENT_START = 1
- OUTPUT = 2
- STATEMENT_END = 3
- KEYWORD_FOR = 4
- KEYWORD_ENDFOR = 5
- KEYWORD_IN = 6
- WHITESPACE = (No number, has a -> skip)
- IDENTIFIER = 7
I have a parser grammar, which uses the tokens in the below order, and my Parser.cs file has them using them defined with these integer values:
- OUTPUT = 1
- STATEMENT_START = 2
- KEYWORD_FOR = 3
- IDENTIFIER = 4
- KEYWORD_IN = 5
- STATEMENT_END = 6
- KEYWORD_ENDFOR = 7
Additionally, I get warnings of implicit token creation when I execute antlr on the parser.g4 file.
When I run my test program, I check the token types, and they are all matched correctly. But the parser is not able to parse the input correctly. If I take the token type integers that the lexer reports, and compare it to the token type integers listed in the Parser.cs file, I can see that the parser, using the integer values, is parsing it "correctly" - from its perspective.
If I add this to the top of the parser grammar, it parses the output just fine. I also notice that the tokens are defined in Parser.cs with the correct integer numbers.
tokens { STATEMENT_START, OUTPUT, STATEMENT_END, KEYWORD_FOR, KEYWORD_ENDFOR, KEYWORD_IN, WHITESPACE, IDENTIFIER }
It's clear to me that:
- While I do not need to specify tokens in a parser file, if I don't, it will define them for me.
- The 'token type' that is passed from the lexer to the parser is a pure integer.
- If the integral value of the token type doesn't match what the parser would expect, it will not parse correctly.
- The tokens { } section allows me to specify a specific order of tokens in the parser, so the lexer and parser are using the same token IDs.
Currently, I am manually keeping the two token lists in sync - I am taking the token names in the lexer.cs file, and putting them, in that order, in tokens { } section of the parser.g4 file. This fixes my issue, but its a pain.
Attached are sample files.
Thanks for any help you can provide.
On Sat, Jan 18, 2020, 21:22 ericvergnaud notifications@github.com wrote:
you cannot add tokens to a parser grammar so not sure how they would not match. (unless you are missing the ‘parser grammar’ declaration in your g4?)
Le 19 janv. 2020 à 10:15, Mike Christiansen notifications@github.com a écrit :
Oh. Okay. So as long as the ID numbers match for each token it's not an issue?
Speaking of, is there an easy way of ensuring the token ids match in each grammar? If I add tokens to the lexer grammar, I have to make sure I add them to the parser grammar in the exact same order.
On Sat, Jan 18, 2020, 20:58 ericvergnaud <notifications@github.com mailto:notifications@github.com> wrote:
Hi,
the lexer is where tokens are given birth the parser accessors are just shortcuts to the underlying lexer token factory
Eric
Le 18 janv. 2020 à 23:01, Mike Christiansen < notifications@github.com mailto:notifications@github.com> a écrit :
C# Runtime, Nuget package version 4.7.2, ANTLR version antlr-4.8-complete.jar
Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.
I have written a custom token, inheriting from CommonToken. I created a token factory, implementing ITokenFactory.
I can set the token factory of the lexer just fine, using the below code:
lexer.TokenFactory = tokenFactory;
But, I cannot set the parser's Token Factory property, since it is read only.
I would expect to be able to use this code:
parser.TokenFactory = tokenFactory;
Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.
Thanks in advance!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <
https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q , or unsubscribe <
https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA < https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519 < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA < https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJB3COYR3VNGYQOTX5LQ6OZTRA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKG3YQ#issuecomment-575958498>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAZNQJCR2VYOTAII4HU23XTQ6OZTRANCNFSM4KITOGUA .
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVECCJHTAERNM5G4S33Q6O2PFA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKG7GI#issuecomment-575958937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI4GGVABVKZXIVP46XFYDSTQ6O2PFANCNFSM4KITOGUA .
C# Runtime, Nuget package version 4.7.2, ANTLR version
antlr-4.8-complete.jar
Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.
I have written a custom token, inheriting from
CommonToken
. I created a token factory, implementingITokenFactory
.I can set the token factory of the lexer just fine, using the below code:
lexer.TokenFactory = tokenFactory;
But, I cannot set the parser's Token Factory property, since it is read only.
I would expect to be able to use this code:
parser.TokenFactory = tokenFactory;
Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.
Thanks in advance!