C# runtime: TokenFactory on parser is read only

C# Runtime, Nuget package version 4.7.2, ANTLR version antlr-4.8-complete.jar

Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.

I have written a custom token, inheriting from CommonToken. I created a token factory, implementing ITokenFactory.

I can set the token factory of the lexer just fine, using the below code:

lexer.TokenFactory = tokenFactory;

But, I cannot set the parser's Token Factory property, since it is read only.

I would expect to be able to use this code:

parser.TokenFactory = tokenFactory;

Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.

Thanks in advance!

Hi,

the lexer is where tokens are given birth the parser accessors are just shortcuts to the underlying lexer token factory

Eric

Le 18 janv. 2020 à 23:01, Mike Christiansen notifications@github.com a écrit :

C# Runtime, Nuget package version 4.7.2, ANTLR version antlr-4.8-complete.jar

Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.

I have written a custom token, inheriting from CommonToken. I created a token factory, implementing ITokenFactory.

I can set the token factory of the lexer just fine, using the below code:

lexer.TokenFactory = tokenFactory;

But, I cannot set the parser's Token Factory property, since it is read only.

I would expect to be able to use this code:

parser.TokenFactory = tokenFactory;

Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.

Thanks in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA.

Oh. Okay. So as long as the ID numbers match for each token it's not an issue?

Speaking of, is there an easy way of ensuring the token ids match in each grammar? If I add tokens to the lexer grammar, I have to make sure I add them to the parser grammar in the exact same order.

On Sat, Jan 18, 2020, 20:58 ericvergnaud notifications@github.com wrote:

Hi,

the lexer is where tokens are given birth the parser accessors are just shortcuts to the underlying lexer token factory

Eric

Le 18 janv. 2020 à 23:01, Mike Christiansen notifications@github.com a écrit :

C# Runtime, Nuget package version 4.7.2, ANTLR version antlr-4.8-complete.jar

Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.

I have written a custom token, inheriting from CommonToken. I created a token factory, implementing ITokenFactory.

I can set the token factory of the lexer just fine, using the below code:

lexer.TokenFactory = tokenFactory;

But, I cannot set the parser's Token Factory property, since it is read only.

I would expect to be able to use this code:

parser.TokenFactory = tokenFactory;

Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.

Thanks in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA .

you cannot add tokens to a parser grammar so not sure how they would not match. (unless you are missing the ‘parser grammar’ declaration in your g4?)

Le 19 janv. 2020 à 10:15, Mike Christiansen notifications@github.com a écrit :

Oh. Okay. So as long as the ID numbers match for each token it's not an issue?

Speaking of, is there an easy way of ensuring the token ids match in each grammar? If I add tokens to the lexer grammar, I have to make sure I add them to the parser grammar in the exact same order.

On Sat, Jan 18, 2020, 20:58 ericvergnaud <notifications@github.com mailto:notifications@github.com> wrote:

Hi,

the lexer is where tokens are given birth the parser accessors are just shortcuts to the underlying lexer token factory

Eric

Le 18 janv. 2020 à 23:01, Mike Christiansen <notifications@github.com mailto:notifications@github.com> a écrit :

C# Runtime, Nuget package version 4.7.2, ANTLR version antlr-4.8-complete.jar

Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.

I have written a custom token, inheriting from CommonToken. I created a token factory, implementing ITokenFactory.

I can set the token factory of the lexer just fine, using the below code:

lexer.TokenFactory = tokenFactory;

But, I cannot set the parser's Token Factory property, since it is read only.

I would expect to be able to use this code:

parser.TokenFactory = tokenFactory;

Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.

Thanks in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519 https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA> .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJB3COYR3VNGYQOTX5LQ6OZTRA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKG3YQ#issuecomment-575958498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZNQJCR2VYOTAII4HU23XTQ6OZTRANCNFSM4KITOGUA.

Okay, here's my issue, regarding the token ordering...

I have a lexer grammar that defines eight tokens. They are defined in the below order, and my Lexer.cs file has them using these integer values:

STATEMENT_START = 1
OUTPUT = 2
STATEMENT_END = 3
KEYWORD_FOR = 4
KEYWORD_ENDFOR = 5
KEYWORD_IN = 6
WHITESPACE = (No number, has a -> skip)
IDENTIFIER = 7

I have a parser grammar, which uses the tokens in the below order, and my Parser.cs file has them using them defined with these integer values:

OUTPUT = 1
STATEMENT_START = 2
KEYWORD_FOR = 3
IDENTIFIER = 4
KEYWORD_IN = 5
STATEMENT_END = 6
KEYWORD_ENDFOR = 7

Additionally, I get warnings of implicit token creation when I execute antlr on the parser.g4 file.

When I run my test program, I check the token types, and they are all matched correctly. But the parser is not able to parse the input correctly. If I take the token type integers that the lexer reports, and compare it to the token type integers listed in the Parser.cs file, I can see that the parser, using the integer values, is parsing it "correctly" - from its perspective.

If I add this to the top of the parser grammar, it parses the output just fine. I also notice that the tokens are defined in Parser.cs with the correct integer numbers.

tokens { STATEMENT_START, OUTPUT, STATEMENT_END, KEYWORD_FOR, KEYWORD_ENDFOR, KEYWORD_IN, WHITESPACE, IDENTIFIER }

It's clear to me that:

While I do not need to specify tokens in a parser file, if I don't, it will define them for me.
The 'token type' that is passed from the lexer to the parser is a pure integer.
If the integral value of the token type doesn't match what the parser would expect, it will not parse correctly.
The tokens { } section allows me to specify a specific order of tokens in the parser, so the lexer and parser are using the same token IDs.

Currently, I am manually keeping the two token lists in sync - I am taking the token names in the lexer.cs file, and putting them, in that order, in tokens { } section of the parser.g4 file. This fixes my issue, but its a pain.

Attached are sample files.

Thanks for any help you can provide.

On Sat, Jan 18, 2020, 21:22 ericvergnaud notifications@github.com wrote:

you cannot add tokens to a parser grammar so not sure how they would not match. (unless you are missing the ‘parser grammar’ declaration in your g4?)

Le 19 janv. 2020 à 10:15, Mike Christiansen notifications@github.com a écrit :

Oh. Okay. So as long as the ID numbers match for each token it's not an issue?

Speaking of, is there an easy way of ensuring the token ids match in each grammar? If I add tokens to the lexer grammar, I have to make sure I add them to the parser grammar in the exact same order.

On Sat, Jan 18, 2020, 20:58 ericvergnaud <notifications@github.com mailto:notifications@github.com> wrote:

Hi,

the lexer is where tokens are given birth the parser accessors are just shortcuts to the underlying lexer token factory

Eric

Le 18 janv. 2020 à 23:01, Mike Christiansen < notifications@github.com mailto:notifications@github.com> a écrit :

C# Runtime, Nuget package version 4.7.2, ANTLR version antlr-4.8-complete.jar

Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.

I have written a custom token, inheriting from CommonToken. I created a token factory, implementing ITokenFactory.

I can set the token factory of the lexer just fine, using the below code:

lexer.TokenFactory = tokenFactory;

But, I cannot set the parser's Token Factory property, since it is read only.

I would expect to be able to use this code:

parser.TokenFactory = tokenFactory;

Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.

Thanks in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <

https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA < https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519 < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA < https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJB3COYR3VNGYQOTX5LQ6OZTRA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKG3YQ#issuecomment-575958498>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAZNQJCR2VYOTAII4HU23XTQ6OZTRANCNFSM4KITOGUA .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVECCJHTAERNM5G4S33Q6O2PFA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKG7GI#issuecomment-575958937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI4GGVABVKZXIVP46XFYDSTQ6O2PFANCNFSM4KITOGUA .

Aha! I think I found the answer. options { tokenVocab = Lexer };

On Tue, Jan 21, 2020 at 11:44 AM Michael Christiansen < michael.a.christiansen10@gmail.com> wrote:

Okay, here's my issue, regarding the token ordering...

I have a lexer grammar that defines eight tokens. They are defined in the below order, and my Lexer.cs file has them using these integer values:

STATEMENT_START = 1

OUTPUT = 2

STATEMENT_END = 3

KEYWORD_FOR = 4

KEYWORD_ENDFOR = 5

KEYWORD_IN = 6

WHITESPACE = (No number, has a -> skip)

IDENTIFIER = 7

I have a parser grammar, which uses the tokens in the below order, and my Parser.cs file has them using them defined with these integer values:

OUTPUT = 1

STATEMENT_START = 2

KEYWORD_FOR = 3

IDENTIFIER = 4

KEYWORD_IN = 5

STATEMENT_END = 6

KEYWORD_ENDFOR = 7

Additionally, I get warnings of implicit token creation when I execute antlr on the parser.g4 file.

When I run my test program, I check the token types, and they are all matched correctly. But the parser is not able to parse the input correctly. If I take the token type integers that the lexer reports, and compare it to the token type integers listed in the Parser.cs file, I can see that the parser, using the integer values, is parsing it "correctly" - from its perspective.

If I add this to the top of the parser grammar, it parses the output just fine. I also notice that the tokens are defined in Parser.cs with the correct integer numbers.

tokens { STATEMENT_START, OUTPUT, STATEMENT_END, KEYWORD_FOR, KEYWORD_ENDFOR, KEYWORD_IN, WHITESPACE, IDENTIFIER }

It's clear to me that:

While I do not need to specify tokens in a parser file, if I don't, it will define them for me.

The 'token type' that is passed from the lexer to the parser is a pure integer.

If the integral value of the token type doesn't match what the parser would expect, it will not parse correctly.

The tokens { } section allows me to specify a specific order of tokens in the parser, so the lexer and parser are using the same token IDs.

Currently, I am manually keeping the two token lists in sync - I am taking the token names in the lexer.cs file, and putting them, in that order, in tokens { } section of the parser.g4 file. This fixes my issue, but its a pain.

Attached are sample files.

Thanks for any help you can provide.

On Sat, Jan 18, 2020, 21:22 ericvergnaud notifications@github.com wrote:

you cannot add tokens to a parser grammar so not sure how they would not match. (unless you are missing the ‘parser grammar’ declaration in your g4?)

Le 19 janv. 2020 à 10:15, Mike Christiansen notifications@github.com a écrit :

Oh. Okay. So as long as the ID numbers match for each token it's not an issue?

Speaking of, is there an easy way of ensuring the token ids match in each grammar? If I add tokens to the lexer grammar, I have to make sure I add them to the parser grammar in the exact same order.

On Sat, Jan 18, 2020, 20:58 ericvergnaud <notifications@github.com mailto:notifications@github.com> wrote:

Hi,

the lexer is where tokens are given birth the parser accessors are just shortcuts to the underlying lexer token factory

Eric

Le 18 janv. 2020 à 23:01, Mike Christiansen < notifications@github.com mailto:notifications@github.com> a écrit :

C# Runtime, Nuget package version 4.7.2, ANTLR version antlr-4.8-complete.jar

Hello! I have two grammar files. One is a lexer grammar (so I can use modes) and the other is a parser grammar.

I have written a custom token, inheriting from CommonToken. I created a token factory, implementing ITokenFactory.

I can set the token factory of the lexer just fine, using the below code:

lexer.TokenFactory = tokenFactory;

But, I cannot set the parser's Token Factory property, since it is read only.

I would expect to be able to use this code:

parser.TokenFactory = tokenFactory;

Is there something I am missing? I did search for information, and what I found about the Java runtime implies this is possible (in general), but I cannot see how to do it with the C# runtime.

Thanks in advance!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <

https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJAXU36DHQ4QEM5PQ3LQ6MKUPA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHDP46Q , or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA < https://github.com/notifications/unsubscribe-auth/AAZNQJBCA7TQQ7Z5SLS37Q3Q6MKUPANCNFSM4KITOGUA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519 < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVFAQM7PDNNR3DTVNSTQ6OXSVA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGUDY#issuecomment-575957519 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA < https://github.com/notifications/unsubscribe-auth/AI4GGVBJ44L5MXRL2IAA7HDQ6OXSVANCNFSM4KITOGUA

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AAZNQJB3COYR3VNGYQOTX5LQ6OZTRA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKG3YQ#issuecomment-575958498>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAZNQJCR2VYOTAII4HU23XTQ6OZTRANCNFSM4KITOGUA .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/antlr/antlr4/issues/2726?email_source=notifications&email_token=AI4GGVECCJHTAERNM5G4S33Q6O2PFA5CNFSM4KITOGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKG7GI#issuecomment-575958937, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI4GGVABVKZXIVP46XFYDSTQ6O2PFANCNFSM4KITOGUA .

antlr / antlr4

C# runtime: TokenFactory on parser is read only #2726