b3b00 / csly

a C# embeddable lexer and parser generator (.Net core)
MIT License
348 stars 31 forks source link

Multiple Issues with Lexer #457

Closed utech626 closed 6 days ago

utech626 commented 1 week ago

RenPyParser.zip

I'm working on testing the Lexer and have run into multiple issues: 1) The Lexer seems to stop processing files for no reason (see log file in attached zip file, path to log file is renppy/debug/logs) 2) the parser crashes with invalid reference error while processing TestFiles\recap02.rpy The crash occurs in sly.lexer.fsm.FSMLexer line 13, when the crash occurs the variables contents are: memory = "\r\n" and index = -1. Screen short of call stack attached 3) In the fie RenPyParser.Core.lexer\rpnyLexer.cs line 201 if the line is uncommented most tokens return are "UNDERBAR"

The test app process several hundred files and most look good. Let me know if you have any questions,

Thanks Bob Brown msteams - utech626@ultratechweb.com Screenshot 2024-07-03 173012

b3b00 commented 1 week ago

Hello, @utech626, I'm starting to look at it. thanks a lot for providing a test case, it will sure help.

Olivier

b3b00 commented 1 week ago

@utech626, looking at your logs issue 1 seems directly caused by issue 2 :

2024-07-04T08:33:25.3836062+02:00 [ERR] (RenpyParser.Services.Processor) Source Line: <<<        return self.state == store.presence_state>>>
2024-07-04T08:33:25.3849016+02:00 [ERR] (RenpyParser.Services.Processor) Source Line: <<<default_state = PresenceState()>>>
2024-07-04T08:33:25.3876798+02:00 [ERR] (RenpyParser.Services.Processor) Source Line: <<<config.quit_callbacks += [Discord.close]  # type: ignore>>>
2024-07-04T08:33:25.4036382+02:00 [ERR] (RenpyParser.Services.Processor) Source Line: <<<config.after_load_callbacks += [Discord.after_load]  # type: ignore>>>
2024-07-04T08:33:25.4203320+02:00 [INF] (RenpyParser.Services.Processor) Reading for TestFiles\recap02.rpy:

it clearly show that processing stops when failing on recap02.py. So I will first look at issue 2 hoping it will solve issue 1.

b3b00 commented 1 week ago

@utech626 your second issue (file recap02.rpy) is solved. it was a bad management of empty single line comments. You can test it on branch issue/457. Tell me if it solves your first issue as well please. Now looking at number 3.

b3b00 commented 1 week ago

@utech626 , I've found a bug that explains issue 3. fix is already on branch issue/457. you can test it.

For issue 1, it seems that there is an issue with indentation management. This part of the lexer is quite tedious so fixing may take a bit longer. I can still published a nuget version with the 2 first fixes if you want

b3b00 commented 1 week ago

@utech626 , I am still investigating but I 've found an explanation why you have not seen the error for your first issue. In you LexGenerator class , method LexFile you're calling the lexer this way:

List<Token<rpnyLexer>> _tokens = _lexer.Result.Tokenize(_source.ToString()).Tokens.ToList();

As CSLy lexer always returns the list of token it succeeded to scan, you have some tokens. but the Tokenize() call in fact states that there is an error. So a better way to call the lexer would be :

public List<Token<rpnyLexer>> LexFile(string filePath)
        {
            using (StreamReader _sr = new StreamReader(filePath))
            {
                StringBuilder _source = new StringBuilder(_sr.ReadToEnd());

                _source.AppendLine();
                BuildResult<ILexer<rpnyLexer>> _lexer = LexerBuilder.BuildLexer<rpnyLexer>();

                if (_lexer.IsOk)
                {
                    var lexerResult = _lexer.Result.Tokenize(_source.ToString());
                    if (lexerResult.IsOk)
                    {
                        List<Token<rpnyLexer>> _tokens = lexerResult.Tokens.ToList();
                        return _tokens;
                    }
                    else
                    {
                        Console.WriteLine(lexerResult.Error.ToString());
                        return null;
                    }
                }
                return null;
            }
        }

This way you should see an lexing error message stating Indentation error at (line 73, column 0). (this is for file recap02.rpy)

Now I have to find why this lexer error happens.

utech626 commented 1 week ago

Olivier,

That corrected the crashing issue, 1101 files processed, the complete set, with no crash.

Bob Brown

From: Olivier Duhart @.> Sent: Thursday, July 4, 2024 5:08 AM To: b3b00/csly @.> Cc: Bob Brown @.>; Mention @.> Subject: Re: [b3b00/csly] Multiple Issues with Lexer (Issue #457)

@utech626https://github.com/utech626 , I've found a bug that explains issue 3. fix is already on branch issue/457https://github.com/b3b00/csly/tree/issue/457. you can test it.

For issue 1, it seems that there is an issue with indentation management. This part of the lexer is quite tedious so fixing may take a bit longer. I can still published a nuget version with the 2 first fixes if you want

— Reply to this email directly, view it on GitHubhttps://github.com/b3b00/csly/issues/457#issuecomment-2208604913, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYCONGHQNPQNPF6WITPLDLZKUNI5AVCNFSM6AAAAABKKNN57WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBYGYYDIOJRGM. You are receiving this because you were mentioned.Message ID: @.**@.>>

b3b00 commented 1 week ago

V3 2.1 is releases. It does not solve all your issues but will help

b3b00 commented 6 days ago

@utech626 , I am working on the indentation error on recap02.rpy. Right now I have something that works quite well. I am now trying to pass the full file collection you've added in your zip. Then I've found that your lexer is missing a token | (pipe) in some files (aubrey_ren.py line 36 for instance) : return CharacterTrait.COMPETITIVE | CharacterTrait.TALKATIVE.

Adding [Lexeme(GenericToken.SugarToken, "|")] PIPE in your lexer solve the lexing error.

I've also still t o look at the UNDERBAR issue.

b3b00 commented 6 days ago

@utech626 , the ~(tilde) token is also missing (file character_ren.py line 91) : self.mood &= ~mood

b3b00 commented 6 days ago

The UNDERBAR issue comes from a conflict between [Lexeme(GenericToken.Identifier, IdentifierType.AlphaNumericDash)] IDENTIFIER = 1 and [Lexeme(GenericToken.SugarToken, "_")] UNDERBAR. That's a real bug and will look at it quickly.

For a single _ lexer could not decide whether it is an UNDERBAR or an IDENTIFIER. Nevertheless in your case lexer tags any identifier starting with a _ as an UNDERBAR. In this case it should succeded at taging it IDENTIFIER

b3b00 commented 6 days ago

@utech626 , in fact a sugar token should not starts with a char that is used as a starting char for identifier. The bug here is that the lexerbuilder does not return an error stating this. I can not simply fix the lexer to allow this kind of tokens as it will require an unconflicting process that I can not manage without a great rework of the Lexer , and i do not have enough time for this. I will publish right now a new version that fix the indentation error issue and return an error the identifier / underbar conflict case.

b3b00 commented 6 days ago

v 3.2.2 is published. you can test it and report if all is fine ? please do so it is important for me

b3b00 commented 5 days ago

@utech626 , i've spent time working on your issues. Could you please take some minimal time to provide some feedback regarding the fixes ?

utech626 commented 5 days ago

Oliver,

I’ll do my best to test today. I was on vacation last week so a lot of catch up this week.

Bob

From: Olivier Duhart @.> Sent: Tuesday, July 9, 2024 4:25 AM To: b3b00/csly @.> Cc: Bob Brown @.>; Mention @.> Subject: Re: [b3b00/csly] Multiple Issues with Lexer (Issue #457)

@utech626https://github.com/utech626 , i've spent time working on your issues. Could you please take some minimal time to provide some feedback regarding the fixes ?

— Reply to this email directly, view it on GitHubhttps://github.com/b3b00/csly/issues/457#issuecomment-2217141157, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYCONGREGTM76YEO3MI23TZLOT47AVCNFSM6AAAAABKKNN57WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXGE2DCMJVG4. You are receiving this because you were mentioned.Message ID: @.**@.>>