EliotVU / UnrealScript-Language-Service

Bringing a work-in-progress intelliSense to ye olde UnrealScript :)
MIT License
47 stars 9 forks source link

[Bug]: Parser breaks on documents with special characters #162

Open EliotVU opened 1 year ago

EliotVU commented 1 year ago

Describe the bug

When the parser stumbles on special characters it will fail with the following error:

Processing pending document "file:///c%3A/.../Engine/Classes/PlaylistParserBase.uc":1, source:change.
Invalidating document "PlaylistParserBase".
building document PlaylistParserBase
PredictionMode SLL has failed, rolling back to LL.
An error was thrown while parsing document: "file:///c%3A/.../Engine/Classes/PlaylistParserBase.uc" Error: cannot consume EOF
    at UCTokenStream.consume (c:\Projecten\UnrealScriptLang\out\server.js:28905:19)
    at UCParser.skipLine (c:\Projecten\UnrealScriptLang\out\server.js:5417:25)
    at UCParser.directive (c:\Projecten\UnrealScriptLang\out\server.js:5544:22)
    at UCParser.member (c:\Projecten\UnrealScriptLang\out\server.js:5711:30)
    at UCParser.program (c:\Projecten\UnrealScriptLang\out\server.js:5581:42)
    at UCDocument.build (c:\Projecten\UnrealScriptLang\out\server.js:20658:34)
    at indexDocument (c:\Projecten\UnrealScriptLang\out\server.js:24237:18)
    at Object.next (c:\Projecten\UnrealScriptLang\out\server.js:26656:41)

Appears to be caused by the following code, when an unescaped string literal is proceeded by an eventual hash character:

     ...
     SpecialChars(1)=(Plain=""",Coded=""")
     ...
     SpecialChars(6)=(Plain="�",Coded="™")

Screenshots

No response

EliotVU commented 1 year ago

Weird, as far as UT2004 goes, the string is actually escaped:

     SpecialChars(0)=(Plain="&",Coded="&")
     SpecialChars(1)=(Plain="\"",Coded=""")
     SpecialChars(2)=(Plain=" ",Coded=" ")
     SpecialChars(3)=(Plain="<",Coded="&lt;")
     SpecialChars(4)=(Plain=">",Coded="&gt;")
     SpecialChars(5)=(Plain="©",Coded="&copy;")
     SpecialChars(6)=(Plain="™",Coded="&#8482;")
     SpecialChars(7)=(Plain="®",Coded="&reg;")
     SpecialChars(8)=(Plain="'",Coded="&apos;")
PolaricEntropy commented 8 months ago

I had this happen in the following file, that does not seem to contain special characters: DeusExText.zip

Shtoyan commented 8 months ago

I had this happen in the following file, that does not seem to contain special characters: DeusExText.zip

Did a quick check from interest - extension starts to work when you comment exec directive line.

EliotVU commented 8 months ago

@Shtoyan or by appending function test(); after the exec line seems to work too, I suppose this is because ANTLR is then able to find an alternative rule to match :|

FYI: This is caused by the hacky-code in the ANTLR parser https://github.com/EliotVU/UnrealScript-Language-Service/blob/22b6fbb79476f374c2d7b7c5884cacb7bc1e946c/grammars/UCParser.g4#L12 https://github.com/EliotVU/UnrealScript-Language-Service/blob/22b6fbb79476f374c2d7b7c5884cacb7bc1e946c/grammars/UCParser.g4#L239

EliotVU commented 8 months ago

Unfortunately I have to admit I'm unable to fix this issue in a way that ANTLR would understand.

A proper solution to this #directive parsing would be to switch the lexer's channel when a statement directive has been detected, but ANTLR is incapable of this as far as I know.

Shtoyan commented 8 months ago

No way to just ignore all #directives during parsing/lexing?