aejenk / glitchup

A databender library/executable made in Rust.
Mozilla Public License 2.0
1 stars 0 forks source link

Attempt conversion to parallel iteration #22

Closed aejenk closed 5 years ago

aejenk commented 5 years ago

So far the mutations perform serially - byte by byte. An attempt should be made to switch to parallel iterations using the rayon crate.

To show why this would be useful (if it works), consider that chunk sizes can be incredibly large. For example, for a file that is 1MB, a chunk size can be 1000 bytes. This already can benefit from parallelism. This case is also very minimal - many "useful" chunk sizes would exceed this amount, unless a relatively small (10KB) file is used.

For this purpose, the speed of mutations for exponentially increasing chunksizes should be recorded (1, 10, 100, 1000, 10000, 100000) and compared between parallel, and serial.

Note: Microbenchmarking - or a feature to implement time-taking can be used to facilitate this.

aejenk commented 5 years ago

An idea for parallellism is to parallellize the benders. Each row of mutations can be split into its own thread, for example, so for [[a,b],[c,d]], [a,b] will be run parallel to [c,d].

In this case, a Bender would create a new thread for each list of iterations - each going (CopyFile + Mutations * amnt_of_muts) * iters. This should speed things up.

Benchmarking can be done using hyperfine

aejenk commented 5 years ago

Performed major refactoring to prepare for parallellism. Refactoring didn't produce any significant performance changes:

Hyperfine 'cargo run' before refactor

Benchmark #1: cargo run
  Time (mean ± σ):      82.2 ms ±   6.6 ms    [User: 65.4 ms, System: 10.4 ms]
  Range (min … max):    72.5 ms … 101.3 ms    37 runs

Hyperfine 'cargo run' after refactor

Benchmark #1: cargo run
  Time (mean ± σ):      79.5 ms ±   7.3 ms    [User: 61.8 ms, System: 12.3 ms]
  Range (min … max):    70.4 ms …  97.9 ms    40 runs

Infact, the refactor might have improved performance, if only slightly.

aejenk commented 5 years ago

Parallellism finished. Performance increase ranges from 36% (4 lists of mutations) to 61% (4 lists + 100 loops)