danneu / html-parser

a lenient html5 parser written in Elm
MIT License
3 stars 0 forks source link

Inspect performance #4

Open danneu opened 2 years ago

danneu commented 2 years ago

Some parsing in this lib is written or refactored naively. For example, I'm not so sure that it was a net improvement to create a text decoder that used lookahead to be self-contained, or at the very least I need to do some performance inspection.

One possible performance quagmire is how I've introduced html entity leniency, or how I accept, say, "<" as text instead of requiring it to be escaped "<". Off the top of my head this might result in massive backtracking if a string were to start with something like "<a small mouse once said".

There are various things I could do here. Frankly, I haven't much noticed because my use-case for this library so far has been small html snippets. But it's something I need to look in to.

miniBill commented 8 months ago

I found a document which takes 2 whole seconds to be parsed. test.zip