cavazos-lab / PolyBench-ACC

Other
58 stars 37 forks source link

What is the difference between this version and the one on the website? #4

Open rjfnobre opened 8 years ago

rjfnobre commented 8 years ago

Regarding the OpenCL and the CUDA implementations, what is the difference between this version and the one on the website?

http://web.cse.ohio-state.edu/~pouchet/software/polybench/GPU/

rjfnobre commented 8 years ago

And what about the OpenMP versions in this repository?

Were the pragmas manually added so that the OpenMP versions would have more performance than the original sequential C versions? Or was some automatic paralelization tool used (such as PLUTO: http://pluto-compiler.sourceforge.net/) to decide where to put the OpenMP pragmas and/or apply some other transformations (such as loop tiling)?

Do the OpenMP versions scale well with the number of CPU cores? Are the OpenMP versions faster with two CPU cores, versus the original sequential C versions? I ask this because many times if one is not careful (at least in my experience) when using OpenMP, one ends with a poor implementation.

I searched for that information in the paper "Auto-tuning a High-Level Language Targeted to GPU Codes" but I could not find it.

sgrauerg commented 8 years ago

The versions are similar for the OpenCL and CUDA implementations (the benchmarks themselves should be the same), but the implementation on Github is updated and is based off of PolyBench 3.2, contains more benchmarks, and adds OpenACC and OpenMP implementations while the implementation at http://web.cse.ohio-state.edu/~pouchet/software/polybench/GPU/ is based off of PolyBench 2.0.

I'll note that the results presented in the paper "Auto-tuning a High-Level Language Targeted to GPU Codes" used the older version (that was based on PolyBench 2.0).

-Scott Grauer-Gray sgrauerg@gmail.com

On Sun, May 29, 2016 at 10:30 AM, rjfnobre notifications@github.com wrote:

Regarding the OpenCL and the CUDA implementations, what is the difference between this version and the one on the website?

http://web.cse.ohio-state.edu/~pouchet/software/polybench/GPU/

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cavazos-lab/PolyBench-ACC/issues/4, or mute the thread https://github.com/notifications/unsubscribe/AB6pUGzlHGvJp4RwCoHzGTdHxOcSPJTVks5qGaMRgaJpZM4IpTTU .

sgrauerg commented 8 years ago

I did not write the OpenMP versions (they were not part of the original work presented in the paper), but I believe they were added manually and should have better performance than the sequential versions and scale with the number of CPU cores.

-Scott Grauer-Gray sgrauerg@gmail.com

On Sun, May 29, 2016 at 11:42 AM, rjfnobre notifications@github.com wrote:

And what about the OpenMP versions in this repository?

Were the pragmas manually added so that the OpenMP versions would have more performance than the original sequential C versions? Or was some automatic paralelization tool used (such as PLUTO: http://pluto-compiler.sourceforge.net/) to decide where to put the OpenMP pragmas and/or apply some other transformations (such as loop tiling)?

Do the OpenMP versions scale well with the number of CPU cores? Are the OpenMP versions faster with two CPU cores, versus the original sequential C versions? I ask this because many times if one is not careful (at least in my experience) when using OpenMP, one ends with a poor implementation.

I searched for that information in the paper "Auto-tuning a High-Level Language Targeted to GPU Codes" but I could not find it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cavazos-lab/PolyBench-ACC/issues/4#issuecomment-222367184, or mute the thread https://github.com/notifications/unsubscribe/AB6pUHhOyTwCOF78dqOftMMQ9Clr_h84ks5qGbPkgaJpZM4IpTTU .