Open eagleflo opened 1 year ago
Looking back to this, I've come up with some ideas:
Out of these, I'm right now most intrigued by the third option, as it would cut the amount of data into a third right away and provides an extensible base for future needs. I'll give it a try.
It's quite cumbersome to try to read an SQLite database that's embedded in the binary. I might come back to this approach later, but for now I'll just compress the JSON files with flate2
. This is already a marked improvement.
Compressing the JSON files results in the binary shrinking from 121MB to 32MB... however, this also results in a hefty performance degradation:
~/jisho (compress-dictionaries) % ./bench
Finished release [optimized] target(s) in 0.02s
Benchmark 1: cargo run --release 緑
Time (mean ± σ): 285.1 ms ± 4.3 ms [User: 205.4 ms, System: 79.0 ms]
Range (min … max): 279.6 ms … 294.7 ms 10 runs
Benchmark 1: cargo run --release みどり
Time (mean ± σ): 326.9 ms ± 6.6 ms [User: 234.9 ms, System: 91.2 ms]
Range (min … max): 321.6 ms … 344.8 ms 10 runs
Benchmark 1: cargo run --release green
Time (mean ± σ): 641.5 ms ± 4.5 ms [User: 496.0 ms, System: 144.1 ms]
Range (min … max): 635.1 ms … 648.4 ms 10 runs
compared to
~/jisho (main) % ./bench
Compiling jisho v0.1.7 (/home/aku/jisho)
Finished release [optimized] target(s) in 22.98s
Benchmark 1: cargo run --release 緑
Time (mean ± σ): 204.7 ms ± 1.9 ms [User: 137.3 ms, System: 66.8 ms]
Range (min … max): 201.6 ms … 207.8 ms 14 runs
Benchmark 1: cargo run --release みどり
Time (mean ± σ): 232.6 ms ± 2.7 ms [User: 147.5 ms, System: 84.2 ms]
Range (min … max): 229.0 ms … 237.6 ms 12 runs
Benchmark 1: cargo run --release green
Time (mean ± σ): 448.2 ms ± 4.7 ms [User: 295.3 ms, System: 151.7 ms]
Range (min … max): 441.0 ms … 454.0 ms 10 runs
Slowing down the quick CLI lookup usecase by 50% is a dealbreaker. I'll figure out something else.
I keep thinking moving from JSON files to SQLite would most likely be a big improvement here, in addition to being more flexible in other ways.
As of now
jisho
is a quite large binary, as no effort whatsoever has been spent in optimizing for binary size.However, it looks like Rust tooling has (recently?) grown more aware of binary sizes, and trying to update the embedded JMdict version to a more recent version triggered some built-in size limit of Crates.io. The JSON files derived from JMdict are certainly much more verbose than necessary, so this should be relatively easy to fix.