commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
122 stars 24 forks source link

Investigate MyHTML parser #53

Open sylvinus opened 8 years ago

sylvinus commented 8 years ago

https://lexborisov.github.io/myhtml/

They are reporting an impressive 10x speedup over Gumbo: http://lexborisov.github.io/benchmark-html-persers/

There are a few concerns beyond performance (testing on huge datasets, security, python bindings, ...) but 10x is large enough an improvement that we should look into it!