doctrine / lexer

Base library for a lexer that can be used in Top-Down, Recursive Descent Parsers.
https://www.doctrine-project.org/projects/lexer.html
MIT License
11.03k stars 60 forks source link

ResetPosition doesn't work by token position #53

Open jaapio opened 2 years ago

jaapio commented 2 years ago

When you have a lexer with formats containing multiple characters the reset position doesn't work as expected. Position is an internal pointer to the position of the lexer in the tokens array and not to the position as exposed in the token itself.

private function parseNamedReference(): string
{
$startPosition = $this->lexer->token['position'];
while ($this->lexer->moveNext()) {
}

$this->lexer->resetPosition($startPosition);
$this->lexer->moveNext();
$this->lexer->moveNext();
} 

In the example above I would expect that a resetPosition would throw me back to the position on method entry. But since my tokens do have multiple characters, this doesn't work. A fix would be to set the index of each token. Like this:

$this->tokens[$match[1]] = [
'value' => $match[0],
'type' => $type,
'position' => $match[1],
];

However, this would break the step process using $this->position++

another solution could be to have a map between the token position and location in the tokens array. This would have an impact on the memory usage since it would require an extra array of integers.

I would be happy to provide a patch to fix this issue, but I would like to have some guidance on what is expected in this library. Any change in resetPosition would be a breaking change as it would change the behavior of this lib.

jaapio commented 2 years ago

My workaround for now:

    public function resetPosition($position = 0)
    {
        parent::resetPosition($this->tokenPositions[$position]);
    }

    protected function scan($input)
    {
        parent::scan($input); // TODO: Change the autogenerated stub

        $class = new \ReflectionClass(AbstractLexer::class);
        $property = $class->getProperty('tokens');
        $property->setAccessible(true);
        $tokens = $property->getValue($this);

        $this->tokenPositions = array_flip(array_column($tokens, 'position'));
    }
instabledesign commented 8 months ago

Hi 👋 is this PR https://github.com/doctrine/lexer/pull/12 should solve your problem ?