UCL / openqcd-oneapi

GNU General Public License v2.0
0 stars 0 forks source link

Add hipsycl compiler on csd3 and GA #27

Closed tkoskela closed 2 years ago

tkoskela commented 2 years ago

Closes #4

After hipsycl 9.2.0 was installed on csd3, we can now build with hipsycl. I've renamed the targets in the Makefile to include both oneapi and hipsycl CPU and NVidia GPU targets. I've also added job scripts to run the 64-64-64-64 test case with these, although the changes are fairly trivial.

I've also added a CI workflow that installs hipSYCL from the University of Heidelberg's repository, builds main for a cpu target and runs the 16-16-16-16 test case

mkappas commented 2 years ago

I'm trying to test the hipSYCL compiled version on the Cambridge Icelake nodes but I'm getting a Segmentation fault (core dumped) error message. Steps to try and reproduce it:

I cloned the repo and git switch tk/hipsycl and I requested an Icelake node with srun -t 00:30:00 -A DIRAC-DR004-CPU --nodes=1 --exclusive -p icelake --pty bash. Then:

cd openqcd-oneapi/tests/cuda2/dpct_output

make -f Makefile.csd3 hip_omp_cpu

module purge
module load rhel8/default-amp
module load hipsycl/0.9.2/gcc-9.4.0-jg2gfgh
module load gcc/9.4.0

./main.hip_omp_cpu 16 16 16 16 ../../../data/

and I get:

List of detected devices:
hipSYCL OpenMP host device
Selected device: hipSYCL OpenMP host device
Time for AoS to SoA for pauli m +H2D (GPU) (ms): 40.71
Time for AoS to SoA for su3 u +H2D (GPU) (ms): 19.78
Time for AoS to SoA for spinor s +H2D (GPU) (ms): 2.87
Time for cudaMemcpy H2D of lookup tables (ms): 0.53
Time for kernel mul_pauli (ms): 12.15
Segmentation fault (core dumped)

Also minor error on the Makefile.csd3 in lines 37 and 44. You might want to replace module load load with module load.

tkoskela commented 2 years ago

Thank you for the bug report! I had been testing it with the 64 64 64 64 data set and with that it does not segfault. Also rather curiously on GitHub actions it seems to run fine.

Ahh, git lfs does not pull the files by default when you do a git clone. The files in ../../../data are just pointers that point nowhere, but the code only checks that the files exist and tries to read them as binary files. So on csd3 you need to do

module load git-lfs-2.3.0-gcc-5.4.0-cbo6khp
git lfs pull

to get the actual input files and that will fix the segfaulting

mkappas commented 2 years ago

I thought that you used Git LFS only for the bigger files. For some reason I was under the assumption that the "16 16 16 16" data set had been pushed without the LFS and I didn't check further. My bad, but I pulled now everything, re-tested and I can verify that it works as expected. I have also tested the "oneapi_intel_cpu" build and it still produces the correct results.

Approved.

tkoskela commented 2 years ago

You can approve explicitly by going to the changed files and clicking on Start a Review. Thanks!