Open PavelFil opened 2 months ago
I was able to reproduce this synthetic test.
Turns out hQuery/Parser/HTML::parse() is not linear with respect to the number of tags in the document 🤔.
In other words, the hQuery::fromHTML($html)
is affected, but not the >find('script,style')
.
I'll try to analyze the code and improve it.
Thank you for the challenge!
I have an intuition that the issue is in the heavy usage of strspn
and strcspn
for parsing HTML. I had the assumption that they are very fast. But by reading the implementation code I realize that each call is initializing an array of 256 bytes, even for small character list. This doesn't scale well.
I have huge HTML 2MB:
And the request below takes 78 seconds:
In browser equal request takes less than 0.2 seconds.