csgillespie / benchmarkme

Crowd sourced benchmarking
https://csgillespie.github.io/benchmarkme/
40 stars 13 forks source link

Bm parallel #11

Closed jknowles closed 7 years ago

jknowles commented 7 years ago

This adds the functionality to pass cores as a parameter to benchmark_std() and produce a results table. It also adds the new function bm_parallel which is a wrapper function called within run_benchmarks() that allows any benchmark to be tested over a user-specified number of cores.

bm_parallel("matrix_calc_manip", runs = 3, verbose = TRUE, cores = 2)
benchmark_std(runs = 3, cores = 4)

Consequentially, calling:

benchmark_std()

Now returns a results object that has an additional, 6th, column -- cores, set to 0.

Currently, because benchmark_io is structured differently than the other benchmarks, bm_parallel does not work. I have to do a little more thinking about how to spread out the disk read and write aspects and what measuring that in parallel would mean.

Check it out and let me know what can be improved!

codecov-io commented 7 years ago

Codecov Report

Merging #11 into master will increase coverage by 3.56%. The diff coverage is 80.41%.

@@           Coverage Diff           @@
##           master   #11      +/-   ##
=======================================
+ Coverage   55.43%   59%   +3.56%     
=======================================
  Files          21    21              
  Lines         570   644      +74     
=======================================
+ Hits          316   380      +64     
- Misses        254   264      +10
Impacted Files Coverage Δ
R/zzz.r 100% <ø> (ø) :arrow_up:
R/benchmark_std.R 0% <0%> (ø) :arrow_up:
R/benchmark_io.R 100% <100%> (ø) :arrow_up:
R/benchmarks.R 47.45% <63.63%> (+47.45%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fd99acc...57ae2cd. Read the comment docs.

jknowles commented 7 years ago

I also cleaned up the travis build so it is faster and allows more time to add some more unit tests. I can expand the unit tests that are run on travis to make the code coverage bot happy if you like.

csgillespie commented 7 years ago

Just to let you know, my schedule is hectic at the moment - I've agreed to give a data camp course in two weeks, and I'm away at a conference. I'll try and look at this in near future. The PR is much appreciated.

Should I change how benchmark_io is structured to make it easier to use?

jknowles commented 7 years ago

No problem about the delay -- I appreciate the heads up though. I can always use my own fork/branch in the meantime.

I think benchmark_io might merit some more thought about what you want to try to emphasize in the benchmark. I think one way we do it will stress disk throughput more, and another way would stress CPU threading more. But, I'm new to this.

If it's important - I can take a look at it in the next few weeks. If you're OK with it being modular, I'd like to take more time to learn about it when I have more bandwidth and come back to it.

csgillespie commented 7 years ago

I think the modular way is optimal; happy to wait a couple of weeks. I would prefer getting it right, since people will run the benchmark once. We can't just change the benchmark and ask them to run it again.

jknowles commented 7 years ago

Just added another push that splits benchmark_io into bm_read_io and bm_write_io and allows them to be used with the same bm_parallel command.

Test this out and see what you think when you get a chance.

jknowles commented 7 years ago

No pressure @csgillespie -- just a reminder about this.

csgillespie commented 7 years ago

@jknowles Thanks. I'm planning on being back in action on the 18th.

csgillespie commented 7 years ago

I'm almost back in action. I've started playing about with your changes (mainly the io parallel ones). A few comments:

When I run:

R> benchmark_io(runs = 1, size = 5)

I get

# IO benchmarks (2 tests) for size 5 MB:
     Reading a csv with 625000 values: 1.86 (sec).
     Writing a csv with 625000 values: 0.642 (sec).

But

R> benchmark_io(runs = 1, size = 5, cores = 1)

Only gives

# IO benchmarks (2 tests) for size 5 MB: in parallel

Also running

R> benchmark_io(runs = 3, size = c(5,50), cores = 2)
# IO benchmarks (2 tests) for size 5 MB: in parallel
# IO benchmarks (2 tests) for size 50 MB: in parallel
    user system elapsed    test test_group cores
2  0.044  0.096  59.613  read50     read50     1
3  0.044  0.100  54.722  read50     read50     1
4  0.076  0.088  61.472  read50     read50     1
5  0.052  0.048  38.096  read50     read50     2
6  0.048  0.132  35.596  read50     read50     2
7  0.016  0.080  29.285  read50     read50     2
21 0.040  0.024  21.334 write50    write50     1
31 0.032  0.060  19.791 write50    write50     1
41 0.012  0.024  19.678 write50    write50     1
51 0.024  0.044  11.435 write50    write50     2
61 0.020  0.020   9.965 write50    write50     2
71 0.020  0.020   9.953 write50    write50     2

I think write5 and read5 have been relabeled to read50, write50

Looks good!

jknowles commented 7 years ago

@csgillespie I'm back in action on this now too :-)

Any other feedback besides the above?

For the above issue, do you think the behavior between the two calls should be identical to the user? e.g. we just need to make sure the message presented to the user is the same. Or are you concerned that the code underneath is running in parallel when cores = 1.

cores=1 does execute a separate code path than when cores is NULL. I wasn't sure how to resolve that -- I could drop out the original code and only have the parallel path, or work the other way when cores =1. The issue is that to work in parallel, there is a need for do.call and to set up the parallel environment -- which does cause overhead and makes cores=1 and cores=NULL not equivalent.

jknowles commented 7 years ago

Any update @csgillespie -- no rush on my end, but I'd like to finish this up for you when you have time to weigh in :-)

csgillespie commented 7 years ago

Sorry, yes I'll get back on it. I've forked your PR and made a few changes. Your PR highlighted a bug (can't remember what).

Let's try and wrap it up next week.

jknowles commented 7 years ago

No problem -- let me know when you need something from me!

csgillespie commented 7 years ago

My git skills were not up to the task of submitting a PR to your repo with my tweaks. Thanks again for your input.

Fixes #7