Closed nabsul closed 7 years ago
I've been able to simplify exposing this issue with the following snippet:
string stringToParse = "<?php $x = <<<EOTXT\n \"closure\" => Closure {{$r}\n }";
Lexer lexer = new Lexer(new StringReader(stringToParse), Encoding.UTF8, null, LanguageFeatures.Php71Set);
Tokens token;
while ((token = lexer.GetNextToken()) != Tokens.EOF)
{
Console.WriteLine($"Type: {token.ToString()}");
}
Thank you @nabsul !
I'm definitely interested to learn what causes this issue, and please let me know if I can help in any way. Hopefully it's not something deep within the .NET core library.
Either way, as a temporary work-around in my project (to be published this week) I'm simply aborting when TokenPosition.Start
gets stuck in the same place. In my situation this is fine because the project is analytics oriented and a few failed parsings out of several thousands is no big deal.
Hi @nabsul, I have tested the C# snippet you sent, but it does terminate properly. It reported a syntax error - unfinished heredoc, but it did not loop indefinitely. Could you please attach the original file that caused the issue, including encoding, character set - everything. It is difficult to recreate otherwise, because mac has different line ends and sometimes encoding. Thanks.
@michalbrabec It might take a day or two, but I'll whip up a .NET core command line project that demonstrates the issue, with a PHP file included and everything.
@michalbrabec Here's a .NET core project that should very easily produce the error I'm talking about: https://github.com/nabsul/devsense-parser-test
I followed exactly those steps to run this program on my Macbook Pro and got the infinite loop behavior. Does this run fine on your mac?
I'm on dotnet core version 1.0.1. Can you confirm your version?
It'll take me a bit more time to dig for the exact file causing me the trouble. If you still need it I'll look for it later today.
The issue is reproducible on WSL (Win10 1703 with Ubuntu 16.04) with dotnet --version 1.0.4
.
PS: This might (or not) come in handy.
Thanks @nabsul, that would help us a lot. Thanks @petrol for the link, I will look into it.
@michalbrabec File added to the repo: https://github.com/nabsul/devsense-parser-test
Still reproducible on .NET Core 2.0 with up-to-date Parser (1.3.49) (on Linux via WSL)
The token it get's stuck on is the Start of heredoc
and it's "solvable" trough putting a single space between it and the \n
newline there.
Similarly for the FileTest
can be "fixed" trough adding a single space after each <<<EOTXT
. Thus it seems that the bug lies in the lexer's end-line detection on Linux/MacOS (not that surprising given the CLRF, ... differences) while finding the end of T_START_HEREDOC
.
The "fix" makes the lexer completely skip T_START_HEREDOC
, however, which is kinda logical since it is illegal to have a space after it AFAIK.
Editing the input text before feeding it into the parser seems kind of hacky. Is there really no way to solve this in the parser?
I'm not saying it's a solution to the problem. It's most surely not. Just wanted to provide some more info for the owners to help them fix it.
While GitHub says I'm a contributor I'm definitely not the person to fix it. I've just contributed a bit of code required for the peachpie project.
On Sep 3, 2017 21:43, "Nabeel Sulieman" notifications@github.com wrote:
Editing the input text before feeding it into the parser seems kind of hacky. Is there really no way to solve this in the parser?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DEVSENSE/Parsers/issues/6#issuecomment-326826783, or mute the thread https://github.com/notifications/unsubscribe-auth/ABrtcWUA9rWOux2c4_9dsnGFVirql2dLks5sewFzgaJpZM4NvJOQ .
\0 characters are "ignored" on linux (https://github.com/dotnet/coreclr/issues/2051)
Fixed in https://github.com/DEVSENSE/Parsers/commit/d86d5e4c861cd582c898e4ad009ca1df06db48fe by doing EndsWith properly
Great news, thanks @jakubmisek !
Hi,
First off, great Library! I've found it extremely useful. I've run into a rather strange issue that only happens on MacOS/Linux machines. The Lexer is getting stuck in an infinite loop on the piece of PHP code listed below.
On windows the parser works fine, but on Linux, right after
"closure" => Closure
, I'm getting stuck in an infinite loop ofT_ENCAPSED_AND_WHITESPACE
tokens. Whereas on Windows it moves on to aT_CURLY_OPEN
token.Any ideas why this would happen?