eagleflo / jisho

Jisho is a CLI tool & Rust library that provides a Japanese-English dictionary.
GNU General Public License v3.0
13 stars 1 forks source link

Reduce binary size #21

Open eagleflo opened 1 year ago

eagleflo commented 1 year ago

As of now jisho is a quite large binary, as no effort whatsoever has been spent in optimizing for binary size.

However, it looks like Rust tooling has (recently?) grown more aware of binary sizes, and trying to update the embedded JMdict version to a more recent version triggered some built-in size limit of Crates.io. The JSON files derived from JMdict are certainly much more verbose than necessary, so this should be relatively easy to fix.

eagleflo commented 1 year ago

Looking back to this, I've come up with some ideas:

Out of these, I'm right now most intrigued by the third option, as it would cut the amount of data into a third right away and provides an extensible base for future needs. I'll give it a try.

eagleflo commented 1 year ago

It's quite cumbersome to try to read an SQLite database that's embedded in the binary. I might come back to this approach later, but for now I'll just compress the JSON files with flate2. This is already a marked improvement.

eagleflo commented 10 months ago

Compressing the JSON files results in the binary shrinking from 121MB to 32MB... however, this also results in a hefty performance degradation:

~/jisho (compress-dictionaries) % ./bench
    Finished release [optimized] target(s) in 0.02s
Benchmark 1: cargo run --release 緑
  Time (mean ± σ):     285.1 ms ±   4.3 ms    [User: 205.4 ms, System: 79.0 ms]
  Range (min … max):   279.6 ms … 294.7 ms    10 runs

Benchmark 1: cargo run --release みどり
  Time (mean ± σ):     326.9 ms ±   6.6 ms    [User: 234.9 ms, System: 91.2 ms]
  Range (min … max):   321.6 ms … 344.8 ms    10 runs

Benchmark 1: cargo run --release green
  Time (mean ± σ):     641.5 ms ±   4.5 ms    [User: 496.0 ms, System: 144.1 ms]
  Range (min … max):   635.1 ms … 648.4 ms    10 runs

compared to

~/jisho (main) % ./bench
   Compiling jisho v0.1.7 (/home/aku/jisho)
    Finished release [optimized] target(s) in 22.98s
Benchmark 1: cargo run --release 緑
  Time (mean ± σ):     204.7 ms ±   1.9 ms    [User: 137.3 ms, System: 66.8 ms]
  Range (min … max):   201.6 ms … 207.8 ms    14 runs

Benchmark 1: cargo run --release みどり
  Time (mean ± σ):     232.6 ms ±   2.7 ms    [User: 147.5 ms, System: 84.2 ms]
  Range (min … max):   229.0 ms … 237.6 ms    12 runs

Benchmark 1: cargo run --release green
  Time (mean ± σ):     448.2 ms ±   4.7 ms    [User: 295.3 ms, System: 151.7 ms]
  Range (min … max):   441.0 ms … 454.0 ms    10 runs

Slowing down the quick CLI lookup usecase by 50% is a dealbreaker. I'll figure out something else.

eagleflo commented 6 months ago

I keep thinking moving from JSON files to SQLite would most likely be a big improvement here, in addition to being more flexible in other ways.