Closed vzakhari closed 3 years ago
I am not suggesting your changes are not correct, but they are not sufficient to cause correct execution on TGL Gen12LP.
Are they working with another HW+SW combination?
jrhammon@tigerlake:~/PRK/C1z$ icx --version
Intel(R) oneAPI DPC++ Compiler 2021.1.2 (2020.10.0.1214)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2021.1.2/linux/bin
jrhammon@tigerlake:~/PRK/C1z$ LIBOMPTARGET_DEBUG=0 IGC_EnableDPEmulation=1 OverrideDefaultFP64Settings=1 ./nstream-target 10 10
Parallel Research Kernels version 2020
C11/OpenMP TARGET STREAM triad: A = B + scalar * C
Number of iterations = 10
Vector length = 10
OpenMP Device = 0
Solution validates
Rate (MB/s): 7.223774 Avg time (s): 0.000044
jrhammon@tigerlake:~/PRK/C1z$ LIBOMPTARGET_DEBUG=0 IGC_EnableDPEmulation=1 OverrideDefaultFP64Settings=1 ./nstream-alloc-target 10 10
Parallel Research Kernels version 2020
C11/OpenMP TARGET STREAM triad: A = B + scalar * C
Number of iterations = 10
Vector length = 10
OpenMP Device = 0
Segmentation fault (core dumped)
jrhammon@tigerlake:~/PRK/C1z$ LIBOMPTARGET_DEBUG=0 IGC_EnableDPEmulation=1 OverrideDefaultFP64Settings=1 ./nstream-memcpy-target 10 10
Parallel Research Kernels version 2020
C11/OpenMP TARGET STREAM triad: A = B + scalar * C
Number of iterations = 10
Vector length = 10
OpenMP Device = 0
Solution validates
Rate (MB/s): 5.807777 Avg time (s): 0.000055
Segmentation fault (core dumped)
jrhammon@tigerlake:~/PRK/C1z$ LIBOMPTARGET_DEBUG=0 IGC_EnableDPEmulation=1 OverrideDefaultFP64Settings=1 ./nstream-usm-target 10 10
Parallel Research Kernels version 2020
C11/OpenMP TARGET STREAM triad: A = B + scalar * C
Number of iterations = 10
Vector length = 10
OpenMP Device = 0
Failed Validation on output array
Expected checksum: 880.000000
Observed checksum: 0.000000
ERROR: solution did not validate
This one made my machine unresponsive with arguments 10 $((1024*1024*32))
and I had to power-cycle it to stop the program.
jrhammon@tigerlake:~/PRK/C1z$ LIBOMPTARGET_DEBUG=0 IGC_EnableDPEmulation=1 OverrideDefaultFP64Settings=1 ./nstream-ua-target 4 4
Parallel Research Kernels version 2020
C11/OpenMP TARGET STREAM triad: A = B + scalar * C
Number of iterations = 4
Vector length = 4
OpenMP Device = 0
Failed Validation on output array
Expected checksum: 160.000000
Observed checksum: 0.000000
ERROR: solution did not validate
I did not try it on Gen12LP yet. I used CML GEN11.
nstream-alloc-target
: passes with 10 100
and 10 10
nstream-memcpy-target
: passes with 10 100
, but segfaults with 10 10
nstream-usm-target
: does not validate with any input
I am only fixing obvious errors in this PR, and kind of paving the way for more PRs in future.
Signed-off-by: Vyacheslav Zakharin vyacheslav.p.zakharin@intel.com
If this pull request is fixing a bug, please link the associated issue. The rest of this template does not apply.
If this pull request is providing a new implementation of the PRKs, please use the following template.
Note that checking all of the boxes is not required.
New PRK implementation checklist
Which kernels are implemented?
Documentation and build examples
If your implementation uses a new programming model that is not ubiquitious (i.e. included in the system compiler on most systems) then you need to provide a link to the appropriate documentation for a new user to install it, etc.
We strongly recommend that you add the appropriate features to
make.defs.${toolchain}
if appropriate.Do you certify that your contribution is made in good faith and does not attempt to introduce any negative behavior into this project?