Multicore/Multithread Performance

csgillespie / benchmarkme

Crowd sourced benchmarking

https://csgillespie.github.io/benchmarkme/

40 stars 13 forks source link

Multicore/Multithread Performance #7

Closed jknowles closed 7 years ago

jknowles commented 7 years ago

Hi,

Great package -- I am so glad someone has done this.

For some workflows it might be nice to have a sense of multicore/multithread performance.

Any plans to add that?

If it could be done it would be a great addition to the package. If not, might you accept a pull request if I can find time to do it?

csgillespie commented 7 years ago

Thanks for the comments.

It's been on my mind, I've just never got round to it. So a PR would be welcome.

If you have an idea of what such a benchmark would look like, I'm all ears. I'm thinking that it would detect the number of cores, run some of the existing programming benchmarks on powers of two. E.g. if you had 8 cores, it would run the benchmark on 1, 2, 4, 8 cores.

jknowles commented 7 years ago

Hi @csgillespie

I am going to try to start this on the multicore branch on my fork.

Are you opinionated about how to do the parallel version of the testing? I see two approaches:

Simply repeat existing benchmarks in a loop and parallelize the loop using foreach
Compile RcppParallel functions that do the same matrix calculations, but in parallel

Option 1 is much easier to implement and is where I am starting. If there is a good reason to prefer option 2, I'm willing to dive into doing that, but it will take much longer for me to implement some of the benchmarks that way...

csgillespie commented 7 years ago

Since we're interested in how standard R code scales across cores, using existing benchmarks would be fine. A few thoughts:

Just to double check, is foreach is cross platform?
It would make sense to run the benchmark using cores to the power of 2, i.e. 1, 2, 4, 8, 16, cores.

jknowles commented 7 years ago

I will double check foreach is crossplatform, but I dev on Windows and Windows is usually the laggard right? It works great on Windows.

There are some issues where Linux/Mac can support thread forking, which is more efficient. I'm mostly concerned personally with accurately benchmarking Windows performance, but I do not think it would be too hard to extend the work to allow thread forking on Mac/Linux platforms.

One thing it will do is add a bunch of dependencies. Should we make them "suggests" or go for it and make them dependencies.

csgillespie commented 7 years ago

I've used it on Linux, so it sounds like it will be fine.

Regarding dependencies, go for standard imports. Once we have a skeleton, we can re-evaluate the situation.

jknowles commented 7 years ago

Cool. Do you want me to give you PR when I have some minimal multicore test in place to check it out? I started with the matrix calculation benchmark because it was one I was most interested in.

csgillespie commented 7 years ago

Yes please.

jknowles commented 7 years ago

Quick question @csgillespie

Would you prefer that users can compare benchmark_std() directly to MC benchmark, or would you prefer an MC benchmark that tells users performance for the same workflow at cores through a sequence of powers of 2?

The latter seems easier because the foreach overhead in the way I am doing MC probably adds something that is not picked up in benchmark_std().

csgillespie commented 7 years ago

I think the latter is more sensible. We don't really care about comparing serial with parallel using a single core.

We would run the benchmark in parallel with one core but only to normalise the results.