Closed bminixhofer closed 3 years ago
Could you provide some instructions how you run your benches? Just copied the .bin
files from the last release, but to no avail - I assume the structure of the .bin
s changed as well.
Benchmarking load tokenizer: Warming up for 3.0000 sthread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Serialization(Io(Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }))', nlprule/benches/load.rs:6:76
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Oh, I forgot that the binaries have changed. You can rebuild them with:
./scripts/build_and_test.sh en xxx
Alright, it was a bit more than that, but it's running now.
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 36.1s, or reduce sample count to 10.
load tokenizer time: [350.09 ms 350.94 ms 351.92 ms]
load rules time: [32.718 ms 32.819 ms 32.929 ms]
A few observations (again, cargo-spellcheck
PoV):
Numbers look really good, getting closer to the point where I would consider not thinking about launching another process just to avoid re-parsing - so this is a huge :+1:
since the data is constant, a 3.0 second warmup does not represent what a common user of cargo-spellcheck
will face (talking mostly caches here, black_box
will only prevent the compiler to pre-maturely optimizing, but the actual CPU doesn't care) - See more details in https://github.com/bminixhofer/nlprule/pull/71 - EDIT: even for the case of a long running service, you are mostly intersted in the one-off execution, both rules and tokenizer loading are not common hot-paths, but rather one-off operations.
criterion
has options for, so I would recommend not much of a warmup, so the worst case represents what cli programs will see
Thanks for working on the perf :heart:
Benches were all done with
vendor_id : AuthenticAMD
cpu family : 23
model : 113
model name : AMD Ryzen 7 3700X 8-Core Processor
stepping : 0
microcode : 0x8701013
cpu MHz : 2200.000
cache size : 512 KB
...
(for perspective).
Alright, it was a bit more than that
I should document that, I guess you had to download the build dirs too.
Thanks for checking the perf! And thanks for the PR. I agree on the point with the warmup, I'll merge it.
Turns out the Chunker
deserialization also contributed a significant amount now that the Tokenizer
is faster. There was a fairly easy fix storing the parameters in one large vector and accessing with (offset, length)
tuples instead of having one vector for each feature which shaves of another ~12% of loading time on the Tokenizer (in terms of the already improved speed above).
Unfortunately I think that's it for the reasonably low-hanging speed improvements :wink:
I had another look at the tagger today. This PR:
get_tags_*
methods to return iterators instead ofVec
.groups
. These were only used by nlprule in thePosReplacer
which wasn't used anywhere as it is not fully implemented. Some of the currently unimplemented rules might need thegroups
in some form though, but we can probably get away with search + caching since the groups are only needed if a rule actually matches there.tags
without synchronization.HashMap
with aVec
since the IDs go from zero ton_words
anyway, so we don't need to hash anything.I see another ~ 30% speedup in loading the Tokenizer. This could also have positive impact on rule checking speed, but there's some weird behavior in the local benchmark on my PC so I have to double check.
@drahnr you might be interested in this PR. It would also be great if you could double check the speedup.