BoboTiG / ebook-reader-dict

Finally decent dictionaries based on Wiktionary for your beloved eBook reader.
http://www.tiger-222.fr/?d=2020/04/17/22/14/21-un-dictionnaire-alternatif-et-complet-pour-votre-liseuse
MIT License
394 stars 21 forks source link

Investigate the use of selectolax to replace bs4 #2071

Open BoboTiG opened 2 months ago

BoboTiG commented 2 months ago

https://github.com/rushter/selectolax is a HTML5 parser, and their benchmark shows an extraordinary improved parsing time compared to bs4.

Lets do some tests to see if we can get something faster using selectolax with its fastest backend (Lexbor).

Upvote & Fund

Fund with Polar

lasconic commented 1 week ago

BS is only used in check-word(s) and in the some scripts to update the lang data. I did a quick test on check-word and on my computer the parsing takes between 20% and 33% of the time compared to actually getting the data (66 to 80%). So there is a potential gain of roughly 25% to get. No bad.

lasconic commented 1 week ago

I didn't the change in scripts. On my computer it's not crazy. On a single run, I don't get any boost. I tried to change check_word but that's another level... We (I...) did some ugly stuff in there to replace text etc... I'm not even sure it's all possible with selectolax API which is centered on CSS. (it could be my only average knowledge of CSS though)

lasconic commented 1 week ago

https://github.com/lasconic/ebook-reader-dict/tree/fix-2071

BoboTiG commented 1 week ago

:thinking: What do you suggest? The idea was cool at the beginning, but at the end this is not an impressive win, and it's less complex to keep BS4? I am OK with that :)