ParRes / Kernels

This is a set of simple programs that can be used to explore the features of a parallel platform.
https://groups.google.com/forum/#!forum/parallel-research-kernels
Other
409 stars 107 forks source link

Fixed device numbers for memory APIs. #536

Closed vzakhari closed 3 years ago

vzakhari commented 3 years ago

Signed-off-by: Vyacheslav Zakharin vyacheslav.p.zakharin@intel.com

If this pull request is fixing a bug, please link the associated issue. The rest of this template does not apply.

If this pull request is providing a new implementation of the PRKs, please use the following template.

Note that checking all of the boxes is not required.

New PRK implementation checklist

Which kernels are implemented?

Documentation and build examples

If your implementation uses a new programming model that is not ubiquitious (i.e. included in the system compiler on most systems) then you need to provide a link to the appropriate documentation for a new user to install it, etc.

We strongly recommend that you add the appropriate features to make.defs.${toolchain} if appropriate.

Do you certify that your contribution is made in good faith and does not attempt to introduce any negative behavior into this project?

jeffhammond commented 3 years ago

I am not suggesting your changes are not correct, but they are not sufficient to cause correct execution on TGL Gen12LP.

Are they working with another HW+SW combination?

jrhammon@tigerlake:~/PRK/C1z$ icx --version
Intel(R) oneAPI DPC++ Compiler 2021.1.2 (2020.10.0.1214)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2021.1.2/linux/bin
jrhammon@tigerlake:~/PRK/C1z$ LIBOMPTARGET_DEBUG=0 IGC_EnableDPEmulation=1 OverrideDefaultFP64Settings=1 ./nstream-target 10 10
Parallel Research Kernels version 2020
C11/OpenMP TARGET STREAM triad: A = B + scalar * C
Number of iterations = 10
Vector length        = 10
OpenMP Device        = 0
Solution validates
Rate (MB/s): 7.223774 Avg time (s): 0.000044
jrhammon@tigerlake:~/PRK/C1z$ LIBOMPTARGET_DEBUG=0 IGC_EnableDPEmulation=1 OverrideDefaultFP64Settings=1 ./nstream-alloc-target 10 10
Parallel Research Kernels version 2020
C11/OpenMP TARGET STREAM triad: A = B + scalar * C
Number of iterations = 10
Vector length        = 10
OpenMP Device        = 0
Segmentation fault (core dumped)
jrhammon@tigerlake:~/PRK/C1z$ LIBOMPTARGET_DEBUG=0 IGC_EnableDPEmulation=1 OverrideDefaultFP64Settings=1 ./nstream-memcpy-target 10 10
Parallel Research Kernels version 2020
C11/OpenMP TARGET STREAM triad: A = B + scalar * C
Number of iterations = 10
Vector length        = 10
OpenMP Device        = 0
Solution validates
Rate (MB/s): 5.807777 Avg time (s): 0.000055
Segmentation fault (core dumped)
jrhammon@tigerlake:~/PRK/C1z$ LIBOMPTARGET_DEBUG=0 IGC_EnableDPEmulation=1 OverrideDefaultFP64Settings=1 ./nstream-usm-target 10 10
Parallel Research Kernels version 2020
C11/OpenMP TARGET STREAM triad: A = B + scalar * C
Number of iterations = 10
Vector length        = 10
OpenMP Device        = 0
Failed Validation on output array
       Expected checksum: 880.000000
       Observed checksum: 0.000000
ERROR: solution did not validate

This one made my machine unresponsive with arguments 10 $((1024*1024*32)) and I had to power-cycle it to stop the program.

jrhammon@tigerlake:~/PRK/C1z$ LIBOMPTARGET_DEBUG=0 IGC_EnableDPEmulation=1 OverrideDefaultFP64Settings=1 ./nstream-ua-target 4 4
Parallel Research Kernels version 2020
C11/OpenMP TARGET STREAM triad: A = B + scalar * C
Number of iterations = 4
Vector length        = 4
OpenMP Device        = 0
Failed Validation on output array
       Expected checksum: 160.000000
       Observed checksum: 0.000000
ERROR: solution did not validate
vzakhari commented 3 years ago

I did not try it on Gen12LP yet. I used CML GEN11.

nstream-alloc-target: passes with 10 100 and 10 10 nstream-memcpy-target: passes with 10 100, but segfaults with 10 10 nstream-usm-target: does not validate with any input

I am only fixing obvious errors in this PR, and kind of paving the way for more PRs in future.