bminixhofer / nlprule

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.
Apache License 2.0
599 stars 39 forks source link

Improve tagger: Return iterators over `WordData`, remove groups, parallelize deserialization #70

Closed bminixhofer closed 3 years ago

bminixhofer commented 3 years ago

I had another look at the tagger today. This PR:

I see another ~ 30% speedup in loading the Tokenizer. This could also have positive impact on rule checking speed, but there's some weird behavior in the local benchmark on my PC so I have to double check.

@drahnr you might be interested in this PR. It would also be great if you could double check the speedup.

drahnr commented 3 years ago

Could you provide some instructions how you run your benches? Just copied the .bin files from the last release, but to no avail - I assume the structure of the .bins changed as well.

Benchmarking load tokenizer: Warming up for 3.0000 sthread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Serialization(Io(Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }))', nlprule/benches/load.rs:6:76
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
bminixhofer commented 3 years ago

Oh, I forgot that the binaries have changed. You can rebuild them with:

./scripts/build_and_test.sh en xxx
drahnr commented 3 years ago

Alright, it was a bit more than that, but it's running now.

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 36.1s, or reduce sample count to 10.
load tokenizer          time:   [350.09 ms 350.94 ms 351.92 ms]                           

load rules              time:   [32.718 ms 32.819 ms 32.929 ms]

A few observations (again, cargo-spellcheck PoV):

Thanks for working on the perf :heart:

drahnr commented 3 years ago

Benches were all done with

vendor_id   : AuthenticAMD
cpu family  : 23
model       : 113
model name  : AMD Ryzen 7 3700X 8-Core Processor
stepping    : 0
microcode   : 0x8701013
cpu MHz     : 2200.000
cache size  : 512 KB
...

(for perspective).

bminixhofer commented 3 years ago

Alright, it was a bit more than that

I should document that, I guess you had to download the build dirs too.

Thanks for checking the perf! And thanks for the PR. I agree on the point with the warmup, I'll merge it.

bminixhofer commented 3 years ago

Turns out the Chunker deserialization also contributed a significant amount now that the Tokenizer is faster. There was a fairly easy fix storing the parameters in one large vector and accessing with (offset, length) tuples instead of having one vector for each feature which shaves of another ~12% of loading time on the Tokenizer (in terms of the already improved speed above).

Unfortunately I think that's it for the reasonably low-hanging speed improvements :wink: