UCL / openqcd-oneapi

GNU General Public License v2.0
0 stars 0 forks source link

Document setting up cuda tests #22

Closed tkoskela closed 2 years ago

tkoskela commented 2 years ago

Create a markdown doc in the repo

tkoskela commented 2 years ago

Generating reference data from the CPU code qcd1

You will have to follow the instructions in the Dump_memory_guide.txt to dump the memory before and after the single precision Dirac-Wilson Operator. I'm pretty positive that the line numbers I specify there, are after compiling with the "-g" option and without any instrinsics, as by compiling the CPU code without any instrinsics the compiler will remove all those preprocessor directives and macros. You will also need to specify the local lattice sizes in each dimension (L0, L1, L2, L3) and you need to run it with a single core. I think that I used the branch feature/library to get the binary files from the CPU version. From what I remember, one of the differences of this branch is that it allows you to specify the size of the simulation on runtime and not compilation time, via an input file (see the DYNAMIC_SIZES). I've attached a sample compile_settings.txt and an input.in file here. You will have to manually create the log/, dat/ and cnfg/ directories in the path specified in the input.in file (relative from the input.in or absolute paths). Then use gdb and dump the specified variables (piup, pidn, s, r, u, m). Save them as:

piup-L0-L1-L2-L3 pidn-L0-L1-L2-L3 sp-s-L0-L1-L2-L3 sp-r-L0-L1-L2-L3 sp-u-L0-L1-L2-L3 sp-m-L0-L1-L2-L3

The r is the output so you should dump it after the loop. Those files will be read in the CUDA version here.

For example, if you used L0=L1=L2=L3=16 the files should be:

piup-16-16-16-16 pidn-16-16-16-16 sp-s-16-16-16-16 sp-r-16-16-16-16 sp-u-16-16-16-16 sp-m-16-16-16-16

and when you run the cuda version you should run it with:

executable L0 L1 L2 L3 /path/to/the/files

ex. executable 16 16 16 16 /path/to/the/files

Read reference values in CUDA code

Done in https://gitlab.com/fastsum/openqcd-fastsum/-/blob/feature/cuda_tests/tests/cuda2/main.c#L65-81

tkoskela commented 2 years ago

Makis' input files compile_settings.txt input.in.txt

tkoskela commented 2 years ago

setting up on csd3

gdb run to dump reference data

(base) [dc-kosk1@login-e-14 run]$ gdb ../build/qcd1 
(gdb) break Dw.c:1392
Breakpoint 1 at 0x48c64a: file ../modules/dirac/Dw.c, line 1392.
(gdb) break Dw.c:1466
Breakpoint 2 at 0x48d22d: file ../modules/dirac/Dw.c, line 1466.
(gdb) r -i input.in
Starting program: /home/dc-kosk1/git_repos/openqcd-fastsum/run/../build/qcd1 -i input.in
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, openqcd_dirac__Dw (mu=0, s=0x75bb80, r=0x767b80) at ../modules/dirac/Dw.c:1392
1392      if (((cpr[0] == 0) && (bc != 3)) || ((cpr[0] == (NPROC0 - 1)) && (bc == 0))) {
Missing separate debuginfos, use: debuginfo-install glibc-2.17-322.el7_9.x86_64 libgcc-4.8.5-44.el7.x86_64 numactl-devel-2.0.12-5.el7.x86_64
(gdb) dump binary value VOLUME.bin openqcd__VOLUME
(gdb) dump binary value mu.bin mu
(gdb) dump binary memory piup.bin piup piup+openqcd__VOLUME*2
(gdb) dump binary memory pidn.bin pidn pidn+openqcd__VOLUME*2
(gdb) dump binary memory s.bin s s+openqcd__VOLUME
(gdb) dump binary memory r.bin r r+openqcd__VOLUME
(gdb) dump binary memory u.bin u u+openqcd__VOLUME*4
(gdb) dump binary memory m.bin m m+openqcd__VOLUME
(gdb) c
Continuing.

Breakpoint 2, openqcd_dirac__Dw (mu=0, s=0x75bb80, r=0x767b80) at ../modules/dirac/Dw.c:1466
1466      cps_ext_bnd(0x1, r);
(gdb) dump binary memory r2.bin r r+openqcd__VOLUME
(gdb) q

postprocess

krishnakumarg1984 commented 2 years ago
break Dw.c:1392
break Dw.c:1466
r -i input.in.txt
dump binary memory piup.bin piup piup+openqcd__VOLUME*2
dump binary memory pidn.bin pidn pidn+openqcd__VOLUME*2
dump binary memory s.bin s s+openqcd__VOLUME
dump binary memory u.bin u u+openqcd__VOLUME*4
dump binary memory m.bin m m+openqcd__VOLUME
c
dump binary memory r.bin r r+openqcd__VOLUME
q