Closed miekkasarki closed 1 month ago
@peyberne A question about these two pragmas that declare the functions on target:
#pragma omp declare target
...
#pragma omp end declare target
These have been (mostly?) removed but in some cases the first one is replaced with the custom pragma DECLARE_TARGET
. There's also the corresponding end pragma DECLARE_TARGET_END
but I don't see it used anywhere.
Can you specify whether the original pragmas should be removed completely or should they be replaced with the custom pragmas? The other pragma, DECLARE_TARGET_SIMD
, is being used consistently.
@peyberne A question about these two pragmas that declare the functions on target:
#pragma omp declare target ... #pragma omp end declare target
These have been (mostly?) removed but in some cases the first one is replaced with the custom pragma
DECLARE_TARGET
. There's also the corresponding end pragmaDECLARE_TARGET_END
but I don't see it used anywhere.Can you specify whether the original pragmas should be removed completely or should they be replaced with the custom pragmas? The other pragma,
DECLARE_TARGET_SIMD
, is being used consistently.
There are also some older code I think where #ifndef GPU
is used instead of the custom pragmas. I'll update those.
@peyberne A question about these two pragmas that declare the functions on target:
#pragma omp declare target ... #pragma omp end declare target
These have been (mostly?) removed but in some cases the first one is replaced with the custom pragma
DECLARE_TARGET
. There's also the corresponding end pragmaDECLARE_TARGET_END
but I don't see it used anywhere. Can you specify whether the original pragmas should be removed completely or should they be replaced with the custom pragmas? The other pragma,DECLARE_TARGET_SIMD
, is being used consistently.There are also some older code I think where
#ifndef GPU
is used instead of the custom pragmas. I'll update those.
I replaced some #pragma omp declare target
by DECLARE_TARGET
when the corresponding routines are used in GPU regions. it refers to openmp or openacc pragmas. For those routine I put also DECLARE_TARGET_END
, for instance in src/plasma.h
.
@peyberne A question about these two pragmas that declare the functions on target:
#pragma omp declare target ... #pragma omp end declare target
These have been (mostly?) removed but in some cases the first one is replaced with the custom pragma
DECLARE_TARGET
. There's also the corresponding end pragmaDECLARE_TARGET_END
but I don't see it used anywhere. Can you specify whether the original pragmas should be removed completely or should they be replaced with the custom pragmas? The other pragma,DECLARE_TARGET_SIMD
, is being used consistently.There are also some older code I think where
#ifndef GPU
is used instead of the custom pragmas. I'll update those.I replaced some
#pragma omp declare target
byDECLARE_TARGET
when the corresponding routines are used in GPU regions. it refers to openmp or openacc pragmas. For those routine I put alsoDECLARE_TARGET_END
, for instance insrc/plasma.h
.
#ifndef GPU
DECLARE_TARGET_SIMD
#else
DECLARE_TARGET
#endif
In this case we can replace these lines by one line:
DECLARE_TARGET_SIMD
and modify in offload_acc_omp.h
the corresponding pragma for openacc :
#define DECLARE_TARGET_SIMD MY_PRAGMA(acc routine seq)
@peyberne A question about these two pragmas that declare the functions on target:
#pragma omp declare target ... #pragma omp end declare target
These have been (mostly?) removed but in some cases the first one is replaced with the custom pragma
DECLARE_TARGET
. There's also the corresponding end pragmaDECLARE_TARGET_END
but I don't see it used anywhere. Can you specify whether the original pragmas should be removed completely or should they be replaced with the custom pragmas? The other pragma,DECLARE_TARGET_SIMD
, is being used consistently.There are also some older code I think where
#ifndef GPU
is used instead of the custom pragmas. I'll update those.I replaced some
#pragma omp declare target
byDECLARE_TARGET
when the corresponding routines are used in GPU regions. it refers to openmp or openacc pragmas. For those routine I put alsoDECLARE_TARGET_END
, for instance insrc/plasma.h
.#ifndef GPU DECLARE_TARGET_SIMD #else DECLARE_TARGET #endif
In this case we can replace these lines by one line:
DECLARE_TARGET_SIMD
and modify inoffload_acc_omp.h
the corresponding pragma for openacc :#define DECLARE_TARGET_SIMD MY_PRAGMA(acc routine seq)
I can modify the offload_acc_omp.h
in this branch file if you agree ?
@peyberne A question about these two pragmas that declare the functions on target:
#pragma omp declare target ... #pragma omp end declare target
These have been (mostly?) removed but in some cases the first one is replaced with the custom pragma
DECLARE_TARGET
. There's also the corresponding end pragmaDECLARE_TARGET_END
but I don't see it used anywhere. Can you specify whether the original pragmas should be removed completely or should they be replaced with the custom pragmas? The other pragma,DECLARE_TARGET_SIMD
, is being used consistently.There are also some older code I think where
#ifndef GPU
is used instead of the custom pragmas. I'll update those.I replaced some
#pragma omp declare target
byDECLARE_TARGET
when the corresponding routines are used in GPU regions. it refers to openmp or openacc pragmas. For those routine I put alsoDECLARE_TARGET_END
, for instance insrc/plasma.h
.#ifndef GPU DECLARE_TARGET_SIMD #else DECLARE_TARGET #endif
In this case we can replace these lines by one line:
DECLARE_TARGET_SIMD
and modify inoffload_acc_omp.h
the corresponding pragma for openacc :#define DECLARE_TARGET_SIMD MY_PRAGMA(acc routine seq)
I can modify the
offload_acc_omp.h
in this branch file if you agree ?
Yes, this is a clean solution. Please go ahead.
Few more changes that I plan to make unless there's a reason not to do so:
n_queue_size
to the particle_simd_fo
struct since these are always used together so no need to have it as additional parameter for functions.#include <immintrin.h>
or is it needed by nvc? I think that even on the main CPU branch that was legacy code and I actually thought I already removed them at some point.offload_acc_omp.h
. src/simulate/mccc/mccc_wiener.c
and src/simulate/mccc/mccc_wiener.h
unless doing that prevents the code from compiling. That module is used only in guiding center mode which hasn't been ported yet, so best not to mix it in this PR.From what I've seen so far, there's no need for any major changes. The only thing what I haven't properly looked into yet is the actual offloading. I'll go through everything else and then start to think about what to do with that. It could make sense to merge them as is, and then have another PR for removing the old infrastructure.
(Btw I really like the color coded error messages in the log. That hasn't even crossed my mind but I will steal that idea 😉)
@peyberne A question about these two pragmas that declare the functions on target:
#pragma omp declare target ... #pragma omp end declare target
These have been (mostly?) removed but in some cases the first one is replaced with the custom pragma
DECLARE_TARGET
. There's also the corresponding end pragmaDECLARE_TARGET_END
but I don't see it used anywhere. Can you specify whether the original pragmas should be removed completely or should they be replaced with the custom pragmas? The other pragma,DECLARE_TARGET_SIMD
, is being used consistently.There are also some older code I think where
#ifndef GPU
is used instead of the custom pragmas. I'll update those.I replaced some
#pragma omp declare target
byDECLARE_TARGET
when the corresponding routines are used in GPU regions. it refers to openmp or openacc pragmas. For those routine I put alsoDECLARE_TARGET_END
, for instance insrc/plasma.h
.#ifndef GPU DECLARE_TARGET_SIMD #else DECLARE_TARGET #endif
In this case we can replace these lines by one line:
DECLARE_TARGET_SIMD
and modify inoffload_acc_omp.h
the corresponding pragma for openacc :#define DECLARE_TARGET_SIMD MY_PRAGMA(acc routine seq)
I can modify the
offload_acc_omp.h
in this branch file if you agree ?Yes, this is a clean solution. Please go ahead.
ok with new offload_acc_omp.h
, we can proceed as follows:
#pragma omp declare simd (uniform)
is used BUT not declared for gpu, no changes.#pragma omp declare simd (uniform)
is used and declared for GPU, typically:
#ifndef GPU
DECLARE_TARGET_SIMD_UNIFORM(pls_data)
#else
DECLARE_TARGET
#endif
we can replace it by :
GPU_DECLARE_TARGET_SIMD[_UNIFORM(x)]
I can do it if you agree ?
@peyberne A question about these two pragmas that declare the functions on target:
#pragma omp declare target ... #pragma omp end declare target
These have been (mostly?) removed but in some cases the first one is replaced with the custom pragma
DECLARE_TARGET
. There's also the corresponding end pragmaDECLARE_TARGET_END
but I don't see it used anywhere. Can you specify whether the original pragmas should be removed completely or should they be replaced with the custom pragmas? The other pragma,DECLARE_TARGET_SIMD
, is being used consistently.There are also some older code I think where
#ifndef GPU
is used instead of the custom pragmas. I'll update those.I replaced some
#pragma omp declare target
byDECLARE_TARGET
when the corresponding routines are used in GPU regions. it refers to openmp or openacc pragmas. For those routine I put alsoDECLARE_TARGET_END
, for instance insrc/plasma.h
.#ifndef GPU DECLARE_TARGET_SIMD #else DECLARE_TARGET #endif
In this case we can replace these lines by one line:
DECLARE_TARGET_SIMD
and modify inoffload_acc_omp.h
the corresponding pragma for openacc :#define DECLARE_TARGET_SIMD MY_PRAGMA(acc routine seq)
I can modify the
offload_acc_omp.h
in this branch file if you agree ?Yes, this is a clean solution. Please go ahead.
ok with new
offload_acc_omp.h
, we can proceed as follows:* For routines where `#pragma omp declare simd (uniform)` is used BUT not declared for gpu, no changes. * For routines where `#pragma omp declare simd (uniform)` is used and declared for GPU, typically:
#ifndef GPU DECLARE_TARGET_SIMD_UNIFORM(pls_data) #else DECLARE_TARGET #endif
we can replace it by :
GPU_DECLARE_TARGET_SIMD[_UNIFORM(x)]
I can do it if you agree ?
Actually I started the modifs :-), tell me if I can push them !
@peyberne A question about these two pragmas that declare the functions on target:
#pragma omp declare target ... #pragma omp end declare target
These have been (mostly?) removed but in some cases the first one is replaced with the custom pragma
DECLARE_TARGET
. There's also the corresponding end pragmaDECLARE_TARGET_END
but I don't see it used anywhere. Can you specify whether the original pragmas should be removed completely or should they be replaced with the custom pragmas? The other pragma,DECLARE_TARGET_SIMD
, is being used consistently.There are also some older code I think where
#ifndef GPU
is used instead of the custom pragmas. I'll update those.I replaced some
#pragma omp declare target
byDECLARE_TARGET
when the corresponding routines are used in GPU regions. it refers to openmp or openacc pragmas. For those routine I put alsoDECLARE_TARGET_END
, for instance insrc/plasma.h
.#ifndef GPU DECLARE_TARGET_SIMD #else DECLARE_TARGET #endif
In this case we can replace these lines by one line:
DECLARE_TARGET_SIMD
and modify inoffload_acc_omp.h
the corresponding pragma for openacc :#define DECLARE_TARGET_SIMD MY_PRAGMA(acc routine seq)
I can modify the
offload_acc_omp.h
in this branch file if you agree ?Yes, this is a clean solution. Please go ahead.
ok with new
offload_acc_omp.h
, we can proceed as follows:* For routines where `#pragma omp declare simd (uniform)` is used BUT not declared for gpu, no changes. * For routines where `#pragma omp declare simd (uniform)` is used and declared for GPU, typically:
#ifndef GPU DECLARE_TARGET_SIMD_UNIFORM(pls_data) #else DECLARE_TARGET #endif
we can replace it by :
GPU_DECLARE_TARGET_SIMD[_UNIFORM(x)]
I can do it if you agree ?
Sure, please go ahead.
How about the "bare" acc pragmas which are not behind ACC=1 guards. For example in src/simulate/step/step_fo_vpa.c
we have #pragma acc data present(h[0:n_queue_size])
where pragma omp simd
used to be. Does that still work with OpenMP nonetheless?
Few more changes that I plan to make unless there's a reason not to do so:
1. Moving `n_queue_size` to the `particle_simd_fo` struct since these are always used together so no need to have it as additional parameter for functions. 2. Removing `#include <immintrin.h>` or is it needed by nvc? I think that even on the main CPU branch that was legacy code and I actually thought I already removed them at some point. 3. Moving all "not implemented for GPUs yet->abort" to a single spot in simulate.c, since we can catch those immediately after options and all inputs are read. 4. Adding documentation to the custom pragmas in `offload_acc_omp.h`. 5. Enable the monitor thread in simulate.c when GPU=0. 6. Revert changes in `src/simulate/mccc/mccc_wiener.c` and `src/simulate/mccc/mccc_wiener.h` unless doing that prevents the code from compiling. That module is used only in guiding center mode which hasn't been ported yet, so best not to mix it in this PR. 7. Combine the two Makefiles.
From what I've seen so far, there's no need for any major changes. The only thing what I haven't properly looked into yet is the actual offloading. I'll go through everything else and then start to think about what to do with that. It could make sense to merge them as is, and then have another PR for removing the old infrastructure.
(Btw I really like the color coded error messages in the log. That hasn't even crossed my mind but I will steal that idea 😉)
#include <immintrin.h>
is not needed by nvc
for this point, notice that omp nested
is deprecated (used in ascot5_main.c
)
In endcond.c
file you removed the ifndef GPU
in (lines 220-229) :
#ifndef GPU
/* Check if the time spent simulating this marker exceeds the
* given limit*/
if(active_cpumax) {
if(p_f->cputime[i] > sim->endcond_max_cputime) {
p_f->endcond[i] |= endcond_cpumax;
p_f->running[i] = 0;
}
}
#endif
I put this ifndef gpu
because the main loop in simulate_fo_fixed.c
takes a lot of time in gpu since it computes all the markers in gpu (instead of NSIMD markers in CPU). If you remove this condition, the gpu version stops before the end of the simulation. Maybe there is another way to avoid this ?
In
endcond.c
file you removed theifndef GPU
in (lines 220-229) :#ifndef GPU /* Check if the time spent simulating this marker exceeds the * given limit*/ if(active_cpumax) { if(p_f->cputime[i] > sim->endcond_max_cputime) { p_f->endcond[i] |= endcond_cpumax; p_f->running[i] = 0; } } #endif
I put this
ifndef gpu
because the main loop insimulate_fo_fixed.c
takes a lot of time in gpu since it computes all the markers in gpu (instead of NSIMD markers in CPU). If you remove this condition, the gpu version stops before the end of the simulation. Maybe there is another way to avoid this ?
I guess we gave you a test case which has this end condition enabled... It can be disabled in the simulation options. If you install the Python package "a5py" that comes with the source code, you can disable this end condition with the command line tool like this: a5editoptions ascot.h5
-> ENDCOND_CPUTIMELIM=0
-> Save and exit (type anything when prompted for a description of the data). Let me know if this works or I can disable this end condition for you if it's more convenient.
In actual simulations, this end condition is only used to ensure that the simulation finishes in a time that was allocated by SLURM.
@peyberne I'm now content with how the code looks like. As we discussed in the meeting, I'd still like to distribute contents of the copytogpu.c to the specific input modules (making it easier to maintain when e.g. the structs are modified). I was able to separate marker offloading from the rest but I wasn't able to, for example, move this:
GPU_MAP_TO_DEVICE(
sim->wall_data.w2d.wall_r[0:sim->wall_data.w2d.n],sim->wall_data.w2d.wall_z[0:sim->wall_data.w2d.n] )
to the corresponding init in wall_2d.c:
void wall_2d_init(wall_2d_data* w, wall_2d_offload_data* offload_data,
real* offload_array) {
w->n = offload_data->n;
w->wall_r = &offload_array[0];
w->wall_z = &offload_array[offload_data->n];
GPU_MAP_TO_DEVICE(w->wall_r[0:w->n],w->wall_z[0:w->n])
}
as is my plan. However, as I said I'm content with how the code is right now and I'm postponing finishing this relocation once I've first removed the redundant old offloading infrastructure (in a separate PR) which might interfere.
So for the next steps I propose that:
feature/15-gpu-event-based
branch to this one.feature/15-gpu-event-based
to the main to continue GPU developmentFor point 0. It's fine with me. For point 1. I checked with OpenACC, the results are ok. We have to check with OpenMP Offload (on Intel)
In
endcond.c
file you removed theifndef GPU
in (lines 220-229) :#ifndef GPU /* Check if the time spent simulating this marker exceeds the * given limit*/ if(active_cpumax) { if(p_f->cputime[i] > sim->endcond_max_cputime) { p_f->endcond[i] |= endcond_cpumax; p_f->running[i] = 0; } } #endif
I put this
ifndef gpu
because the main loop insimulate_fo_fixed.c
takes a lot of time in gpu since it computes all the markers in gpu (instead of NSIMD markers in CPU). If you remove this condition, the gpu version stops before the end of the simulation. Maybe there is another way to avoid this ?I guess we gave you a test case which has this end condition enabled... It can be disabled in the simulation options. If you install the Python package "a5py" that comes with the source code, you can disable this end condition with the command line tool like this:
a5editoptions ascot.h5
->ENDCOND_CPUTIMELIM=0
-> Save and exit (type anything when prompted for a description of the data). Let me know if this works or I can disable this end condition for you if it's more convenient.In actual simulations, this end condition is only used to ensure that the simulation finishes in a time that was allocated by SLURM.
Fine it works !
The release 5.5.5 has been released and I've rebased this branch to 5.5.5 in my local copy, so I'm ready to force-push whenever. Just let me know when you're ready.
The release 5.5.5 has been released and I've rebased this branch to 5.5.5 in my local copy, so I'm ready to force-push whenever. Just let me know when you're ready.
Ok, as soon as we have access to the intel gpu, we test OpenmpOffload.
The release 5.5.5 has been released and I've rebased this branch to 5.5.5 in my local copy, so I'm ready to force-push whenever. Just let me know when you're ready.
Ok, as soon as we have access to the intel gpu, we test OpenmpOffload.
Test on intel gpu ponte vecchio with openmp Offload works well. this PR is ok for us @miekkasarki !
I've rebased feature/15-gpu-event-based
to feature/15-gpu-event-based-staged-for-pr
(at commit when all tests passed) and rebased feature/15-gpu-event-based-staged-for-pr
to current main
.
This latest version compiles but the GPU utilization is very low so something broke during the rebase. I'll try to hunt it down.
I've rebased
feature/15-gpu-event-based
tofeature/15-gpu-event-based-staged-for-pr
(at commit when all tests passed) and rebasedfeature/15-gpu-event-based-staged-for-pr
to currentmain
.This latest version compiles but the GPU utilization is very low so something broke during the rebase. I'll try to hunt it down.
These are the offending lines in simulate_fo_fixed.c. Removing them brings the GPU utilization back to the expected level. Now what's wrong here...
random_normal_simd(&sim->random_data, 3*p.n_mrk, rnd);
mccc_fo_euler(p_ptr, hin, &sim->plasma_data, &sim->mccc_data, rnd);
I've rebased
feature/15-gpu-event-based
tofeature/15-gpu-event-based-staged-for-pr
(at commit when all tests passed) and rebasedfeature/15-gpu-event-based-staged-for-pr
to currentmain
. This latest version compiles but the GPU utilization is very low so something broke during the rebase. I'll try to hunt it down.These are the offending lines in simulate_fo_fixed.c. Removing them brings the GPU utilization back to the expected level. Now what's wrong here...
random_normal_simd(&sim->random_data, 3*p.n_mrk, rnd); mccc_fo_euler(p_ptr, hin, &sim->plasma_data, &sim->mccc_data, rnd);
The problem was on my end (wrong compiler flags). Nothing wrong with the code.
I fixed any remaining issues. If the tests pass, I've got no objection for making the release. @peyberne can you run the benchmarks for this branch and then we are ready.
I've rebased
feature/15-gpu-event-based
tofeature/15-gpu-event-based-staged-for-pr
(at commit when all tests passed) and rebasedfeature/15-gpu-event-based-staged-for-pr
to currentmain
. This latest version compiles but the GPU utilization is very low so something broke during the rebase. I'll try to hunt it down.These are the offending lines in simulate_fo_fixed.c. Removing them brings the GPU utilization back to the expected level. Now what's wrong here...
random_normal_simd(&sim->random_data, 3*p.n_mrk, rnd); mccc_fo_euler(p_ptr, hin, &sim->plasma_data, &sim->mccc_data, rnd);
The problem was on my end (wrong compiler flags). Nothing wrong with the code.
I fixed any remaining issues. If the tests pass, I've got no objection for making the release. @peyberne can you run the benchmarks for this branch and then we are ready.
I performed the tests, all is ok on my side !
Merging GPU ported code. The physics that were ported are orbit-following and Coulomb collisions in GO mode and distribution diagnostics.
@peyberne I made a separate branch from feature/15-gpu-event-based which is now under review. I'll make my modifications here where necessary, and then we can go those together before merging. I'll let you know once I'm finished.