Closed jknowles closed 7 years ago
Merging #11 into master will increase coverage by
3.56%
. The diff coverage is80.41%
.
@@ Coverage Diff @@
## master #11 +/- ##
=======================================
+ Coverage 55.43% 59% +3.56%
=======================================
Files 21 21
Lines 570 644 +74
=======================================
+ Hits 316 380 +64
- Misses 254 264 +10
Impacted Files | Coverage Δ | |
---|---|---|
R/zzz.r | 100% <ø> (ø) |
:arrow_up: |
R/benchmark_std.R | 0% <0%> (ø) |
:arrow_up: |
R/benchmark_io.R | 100% <100%> (ø) |
:arrow_up: |
R/benchmarks.R | 47.45% <63.63%> (+47.45%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update fd99acc...57ae2cd. Read the comment docs.
I also cleaned up the travis build so it is faster and allows more time to add some more unit tests. I can expand the unit tests that are run on travis to make the code coverage bot happy if you like.
Just to let you know, my schedule is hectic at the moment - I've agreed to give a data camp course in two weeks, and I'm away at a conference. I'll try and look at this in near future. The PR is much appreciated.
Should I change how benchmark_io is structured to make it easier to use?
No problem about the delay -- I appreciate the heads up though. I can always use my own fork/branch in the meantime.
I think benchmark_io
might merit some more thought about what you want to try to emphasize in the benchmark. I think one way we do it will stress disk throughput more, and another way would stress CPU threading more. But, I'm new to this.
If it's important - I can take a look at it in the next few weeks. If you're OK with it being modular, I'd like to take more time to learn about it when I have more bandwidth and come back to it.
I think the modular way is optimal; happy to wait a couple of weeks. I would prefer getting it right, since people will run the benchmark once. We can't just change the benchmark and ask them to run it again.
Just added another push that splits benchmark_io
into bm_read_io
and bm_write_io
and allows them to be used with the same bm_parallel
command.
Test this out and see what you think when you get a chance.
No pressure @csgillespie -- just a reminder about this.
@jknowles Thanks. I'm planning on being back in action on the 18th.
I'm almost back in action. I've started playing about with your changes (mainly the io parallel ones). A few comments:
When I run:
R> benchmark_io(runs = 1, size = 5)
I get
# IO benchmarks (2 tests) for size 5 MB:
Reading a csv with 625000 values: 1.86 (sec).
Writing a csv with 625000 values: 0.642 (sec).
But
R> benchmark_io(runs = 1, size = 5, cores = 1)
Only gives
# IO benchmarks (2 tests) for size 5 MB: in parallel
Also running
R> benchmark_io(runs = 3, size = c(5,50), cores = 2)
# IO benchmarks (2 tests) for size 5 MB: in parallel
# IO benchmarks (2 tests) for size 50 MB: in parallel
user system elapsed test test_group cores
2 0.044 0.096 59.613 read50 read50 1
3 0.044 0.100 54.722 read50 read50 1
4 0.076 0.088 61.472 read50 read50 1
5 0.052 0.048 38.096 read50 read50 2
6 0.048 0.132 35.596 read50 read50 2
7 0.016 0.080 29.285 read50 read50 2
21 0.040 0.024 21.334 write50 write50 1
31 0.032 0.060 19.791 write50 write50 1
41 0.012 0.024 19.678 write50 write50 1
51 0.024 0.044 11.435 write50 write50 2
61 0.020 0.020 9.965 write50 write50 2
71 0.020 0.020 9.953 write50 write50 2
I think write5 and read5 have been relabeled to read50, write50
Looks good!
@csgillespie I'm back in action on this now too :-)
Any other feedback besides the above?
For the above issue, do you think the behavior between the two calls should be identical to the user? e.g. we just need to make sure the message presented to the user is the same. Or are you concerned that the code underneath is running in parallel when cores = 1.
cores=1
does execute a separate code path than when cores is NULL. I wasn't sure how to resolve that -- I could drop out the original code and only have the parallel path, or work the other way when cores =1. The issue is that to work in parallel, there is a need for do.call
and to set up the parallel environment -- which does cause overhead and makes cores=1
and cores=NULL
not equivalent.
Any update @csgillespie -- no rush on my end, but I'd like to finish this up for you when you have time to weigh in :-)
Sorry, yes I'll get back on it. I've forked your PR and made a few changes. Your PR highlighted a bug (can't remember what).
Let's try and wrap it up next week.
No problem -- let me know when you need something from me!
My git skills were not up to the task of submitting a PR to your repo with my tweaks. Thanks again for your input.
Fixes #7
This adds the functionality to pass
cores
as a parameter tobenchmark_std()
and produce a results table. It also adds the new functionbm_parallel
which is a wrapper function called withinrun_benchmarks()
that allows any benchmark to be tested over a user-specified number of cores.Consequentially, calling:
Now returns a results object that has an additional, 6th, column --
cores
, set to 0.Currently, because
benchmark_io
is structured differently than the other benchmarks,bm_parallel
does not work. I have to do a little more thinking about how to spread out the disk read and write aspects and what measuring that in parallel would mean.Check it out and let me know what can be improved!