bminixhofer / nlprule

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.
Apache License 2.0
599 stars 39 forks source link

Speed up tagger loading: remove IndexMap, new -> with_capacity #66

Closed bminixhofer closed 3 years ago

bminixhofer commented 3 years ago

Hey @drahnr I've had a go at speeding up loading the Tokenizer today.

I did two things:

Overall I get a 25% speedup, which is something at least. I experimented a bit with parallelization, particularly setting some "anchor" points in the FST and splitting the work in chunks where each chunk iterators from one anchor point to the next, but it seems the speedup from that is nullified by the merge we have to do afterwards.

Maybe there's some more smarter ways to further speed this up, but I couldn't think of anything.

drahnr commented 3 years ago

This is very good news! 25% is already a noticeable improvement, sorry for dropping the ball on this :>

bminixhofer commented 3 years ago

No worries. As of release 0.6.2 you should see the speedup :)