doctrine / lexer

Base library for a lexer that can be used in Top-Down, Recursive Descent Parsers.
https://www.doctrine-project.org/projects/lexer.html
MIT License
11.05k stars 60 forks source link

Replace object-like array with class #75

Closed greg0ire closed 1 year ago

greg0ire commented 1 year ago

The new class is templatable, which should enable us to specify that the ORM Lexer is an Lexer<self::T_*>, and in the future, define an enum called TokenType in the ORM, and switch to Lexer<TokenType>.

Also, I haven't check but I believe it's better memory-wise to use objects rather than arrays.

The assumption I'm making here is that token are used as object-like arrays, and not with e.g. foreach or count()

derrabus commented 1 year ago

Also, I haven't check but I believe it's better memory-wise to use objects rather than arrays.

I'd be surprised, tbh. My expectation would be equal memory usage but worse runtime performance due to more function calls. But maybe we should run a benchmark if performance optimization is what we aim at.

greg0ire commented 1 year ago

IIRC the reason is when you have a class, you have more information on how much memory is needed because the number of properties is fixed an the properties are typed. I'll try to benchmark this.

EDIT: https://stackoverflow.com/questions/2193049/php-objects-vs-arrays-performance-comparison-while-iterating

(that's for a very old version of PHP though)

greg0ire commented 1 year ago

One last thing. Feel free to merge after fixing this.

Thanks! I will do a benchmark before that :)

greg0ire commented 1 year ago

@derrabus if I use the benchmark at https://github.com/greg0ire/lexer/commit/00ada52061c55a1592ebccd44c68e14848604f62, I get the following result:

\Doctrine\Lexer\Benchmark\LexerBench

    benchLexer..............................I0 - [Mo10.754μs vs. Mo13.608μs] -20.97% (±0.00%)

Subjects: 1, Assertions: 0, Failures: 0, Errors: 0
+------------+------------+-----+--------+-----+---------------+------------------+--------------+
| benchmark  | subject    | set | revs   | its | mem_peak      | mode             | rstdev       |
+------------+------------+-----+--------+-----+---------------+------------------+--------------+
| LexerBench | benchLexer |     | 100000 | 1   | 1.723mb 0.00% | 10.754μs -20.97% | ±0.00% 0.00% |
+------------+------------+-----+--------+-----+---------------+------------------+--------------+

If I then switch the code of the benchmark to ->type instead of ['type'], I obtain this result:

\Doctrine\Lexer\Benchmark\LexerBench

    benchLexer..............................I0 - [Mo7.722μs vs. Mo13.608μs] -43.25% (±0.00%)

Subjects: 1, Assertions: 0, Failures: 0, Errors: 0
+------------+------------+-----+--------+-----+---------------+-----------------+--------------+
| benchmark  | subject    | set | revs   | its | mem_peak      | mode            | rstdev       |
+------------+------------+-----+--------+-----+---------------+-----------------+--------------+
| LexerBench | benchLexer |     | 100000 | 1   | 1.723mb 0.00% | 7.722μs -43.25% | ±0.00% 0.00% |
+------------+------------+-----+--------+-----+---------------+-----------------+--------------+

Conclusion: no visible impact on memory, but there is a good performance impact :slightly_smiling_face:

derrabus commented 1 year ago

Impressive!

greg0ire commented 1 year ago

Despite what I thought, this might be too much of a BC-break, because already released versions of doctrine/orm have this private method with this type declaration:https://github.com/doctrine/orm/blob/65da1fe8cb5612c018dd38192f8dd26faad567a5/lib/Doctrine/ORM/Query/Parser.php#L590-L593

This could be worked around with a conflict, but then there are many packages that rely on doctrine/lexer.

malarzm commented 1 year ago

How about we keep the Token class but add an opt-in for a new return type? Internally we may use Token, but user would by default get (array) $token. 2.x could then remove the opt-in feature.