Closed stof closed 5 years ago
This looks promising :)
php test/benchmark/run.php (current master, i.e. 182f34d) Loading: 101.72620534897 Writing: 37.083342075348
php test/benchmark/run.php (this PR) Loading: 69.69865322113 Writing: 37.433831691742
php test/benchmark/run_native.php (same benchmark using DOMDocument::loadHTML
and DOMDocument::saveHTML
instead)
Loading: 10.595810413361
Writing: 3.5749840736389
And for reference, here is the benchmark running on 2.4.0: Loading: 127.82767772675 Writing: 37.827260494232
This is indeed quite promising (note that all my optimizations since 2.4.0 are focusing on the loading part only, that's why there is not much improvements on the writing side).
2.4.0 was 12x slower than the native parser This PR reaches the level of 6.5x slower than native parser.
give the time to test this by my self tomorrow, but looks great! Thanks a lot!
And for the first time since I started this optimization work, the DOMTreeBuilder appears in the hot path defined by blackfire, instead of being entirely dominated by the Tokenizer :smile:
note that I still have a few ideas to keep going after that one (but not as big as that one)
Now that I see it, this looks so obvious :)
give the time to test this by my self tomorrow, but looks great!
impatience :)
Nice to see this benchmark:
v2.3.1
$ php test/benchmark/run.php 10
Loading: 230.20720481873
master
$ php test/benchmark/run.php 10
Loading: 66.839385032654
(php 7.2)
That is almost 4 time faster! (and my guess is that there is still room for improvements as example in the Tokenizer::attribute()
function or moving the readUntilSequence
into the scanner class)
Instead of processing the text token one by one in the main loop, it is now processed in batch until the next special token (< and & which have special handling in the main loop and NUL characters which need to report a parse error).
https://blackfire.io/profiles/compare/8d7277d0-e2ed-40cf-b9b6-bffa6a523ae6/graph
There is a 51% improvement there