Closed jknowles closed 7 years ago
Merging #10 into master will decrease coverage by
-16.53%
. The diff coverage is0%
.
@@ Coverage Diff @@
## master #10 +/- ##
===========================================
- Coverage 55.43% 38.91% -16.53%
===========================================
Files 21 25 +4
Lines 570 812 +242
===========================================
Hits 316 316
- Misses 254 496 +242
Impacted Files | Coverage Δ | |
---|---|---|
R/benchmark_std.R | 0% <ø> (ø) |
:white_check_mark: |
R/zzz.r | 100% <ø> (ø) |
:white_check_mark: |
R/benchmark_prog_mc.R | 0% <0%> (ø) |
|
R/get_parallel.R | 0% <0%> (ø) |
|
R/benchmark_mc.R | 0% <0%> (ø) |
|
R/benchmark_matrix_calc_mc.R | 0% <0%> (ø) |
|
R/benchmarks.R | 0% <0%> (ø) |
:white_check_mark: |
... and 1 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update fd99acc...ce6e154. Read the comment docs.
Thanks for the PR; it's an excellent start. I've been thinking about this a bit more carefully over the last few days. One issue that we'll have is that it won't be possible to determine the number of cores used in benchmark. The way to solve this is to add a new column to the timings
data frame, e.g.
timings = data.frame(user = numeric(runs), system=0, elapsed=0,
test="sort", test_group="matrix_cal", cores = cores)
While this is a change from previous versions of the package, I still have all the uploaded results, so it's easy to append a new column in old data.
This brings me on to two points:
I think we need to differentiate between running in serial and running in parallel with a single core. Perhaps set cores = 0 when running in serial?
I like your idea of using the current benchmarks for the parallel versions. Why not take this one step further? We could have a function called benchmark_parallel
whose argument is the function we went to benchmark. This would mean that benchmark_io could also be benchmarked in parallel if needed.
Looking forward to hearing your thoughts.
I like this idea a lot. For the benchmarks I have already implemented, I did include "cores" in the results output.
I think it is much more sensible to make a generic wrapper for the current benchmark functions that executes them in parallel. This is more elegant than this PR and it avoids duplication and allows additional benchmarks to be added in the future.
I think we should ditch this PR and branch and let me take a whack at implementing benchmark_parallel()
- that is a much smarter way to approach this.
I should have something for you to review by the end of the weekend -- it is straightforward to go from what I've done to that.
If you find that we need to rewrite the other benchmarks, let me know.
The general framework is to duplicate all of the benchmarks but implement them in a multicore fashion. I think, but am not sure, that instead of duplicating them all -- if you like this approach to parallel benchmarking, we can just add an argument to loop over the benchmarks externally, instead of internally as I have done here.
The key framework is in this function:
If you like how this is structured and the way it returns results, I can continue to modify the benchmarks to make them work with this.
One issue is that for parallel to be tested properly, the function has to be repeated at least as many times as the maximum number of cores available. If you look at the
_mc
version of the benchmarks I have implemented, I implement a "K" parameter that is defined by the maximum number of cores a user specified to test X the number of runs. This way all of the benchmarks are run the same amount of times and the test is a fair comparison between performance at 1, 2, or 4 cores for example.Take a look and let me know what you might think the best way to proceed is. I am confident implementing it will only take a little additional time beyond what I've done here.