Open okkevaneck opened 3 months ago
Hi @okkevaneck - we have seen these errors before. I have just tested redgreengpu on LUMI-G and I am able to run under my own build/run framework. So the question is, what is different about yours. I'll look into it.
By the way, ecKit and FCKit are not dependencies of ecTrans so you don't need to build those.
More generally, so everyone is on the same page, let me summarise the current support of AMD GPUs with ecTrans:
Hi @samhatfield, thank you for the quick reply! Interesting that it's different, let me know if I can provide you with any extra info.
Good to know eckit
and fckit
are not dependencies, this will reduce our installation time by some bit.
Also many thanks for the overview of the current state.
We heard from @reuterbal that we should use the redgreengpu
branch as the main branch is currently not stable om AMD architectures, but it's also good to know the developments.
I wasn't able to follow your build instructions completely successfully. I get the interactive node with
salloc --nodes=1 --tasks=1 --cpus-per-task=32 --account=project_465000454 --gpus-per-task=1 --partition=dev-g --time=00:30:00
(is this wrong?)
Then I execute
srun -n 1 ./install_redgreengpu.sh lumi
The build finishes, but when I look at src/build/ectrans.log, I see
-- HIP target architecture: gfx803
It should be gfx90a
. Sure enough, when I test the resulting binary, it doesn't work:
> srun -n 1 ./src/build/ectrans/bin/ectrans-benchmark-gpu-dp
"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
srun: error: nid005006: task 0: Aborted (core dumped)
srun: launch/slurm: _step_signal: Terminating StepId=7971168.2
Is there something I'm missing?
I allocate the node slightly different and SSH onto the compute node, maybe that's what's causing the difference.
To allocate a node, I run:
#!/usr/bin/env bash
JOB_NAME="ia_gpu_dev"
GPUS_PER_NODE=8
NODES=1
NTASKS=8
PARTITION="dev-g"
ACCOUNT="project_465000454"
TIME="01:00:00"
# Allocate interactive node with the set variables above.
salloc \
--gpus-per-node=$GPUS_PER_NODE \
--exclusive \
--nodes=$NODES \
--ntasks=$NTASKS \
--partition=$PARTITION \
--account=$ACCOUNT \
--time=$TIME \
--mem=0 \
--job-name=$JOB_NAME
Then to get onto the compute node, I execute the following from a login node:
ROCR_VISIBLE_DEVICES=0 srun --cpu-bind=mask_cpu:0xfe000000000000 --nodes=1 --pty bash -i
And then I execute the script without any SLURM command, as we're already on the compute node:
./install_redgreengpu.sh lumi
I forgot about the ROCR_VISIBLE_DEVICES=0
and --cpu-bind=mask_cpu:0xfe000000000000
, I think this is what could cause the behavior you're seeing.
Let me know if it helped!
Will give it a go, thanks! I'm waiting quite long today to get allocated a node.
Now I see
-- HIP target architecture: gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a gfx90a
which is good. I still found it difficult to get an interactive session on a compute node:
> ROCR_VISIBLE_DEVICES=0 srun --cpu-bind=mask_cpu:0xfe000000000000 --nodes=1 --pty bash -i srun: Warning: can't honor --ntasks-per-node set to 1 which doesn't match the requested tasks 8 with the number of requested nodes 1. Ignoring --ntasks-per-node.
srun: error: Unable to create step for job 7971505: More processors requested than permitted
Instead I ran
ROCR_VISIBLE_DEVICES=0 srun --ntasks=1 --pty bash -i
Now I've successfully built the binary. And I think I've found the cause of the problem. Could you try running without --nproma $NPROMA
?
In my setup, I get the exact same error as you when I include --nproma 32
. To be honest, this option is sort of irrelevant for ecTrans benchmarking because it determines the data layout in grid point space, but no calculations are done in grid point space. We usually don't specify this option at all when benchmarking ecTrans. But we do like to keep the option so we can replicate situations from the IFS (where NPROMA very much has consequences) in ecTrans. Therefore this option should work, and this is clearly a bug!
For now, if you just want to benchmark ecTrans, you can leave this option off. In the mean time I'll try to find the cause of this bug.
Hmm interesting, I wonder why the interactive node works for me..
I tried running without --nproma 32
and it works, thank you very much!
It does make me wonder, how do you alter the workload size with this version?
I looked at an older version in the beginning of this year, which had the options to scale through the NLAT
and NLON
variables.
Great to hear it works. I'm figuring out how we might fix this so we can run with any NPROMA. Let's keep this issue open until we decide how to proceed.
With the benchmark program the problem size in both spectral and grid point space can be set by a single parameter -t, --truncation
. This is the cutoff zonal and total wavenumber in spectral space. The higher this number, the higher the resolution, and the bigger the work arrays.
By default the benchmark driver will use an octahedral grid for grid point space with a cubic-accuracy representation of waves, which basically means the number of latitudes must be 2 * (truncation + 1). -t, --truncation 79
(which is the default if you don't specify the option) therefore gives an octahedral grid with 160 latitudes. The number of longitude points per latitude depends on the latitude -> it is greatest at the equator and tapers to 20 at the poles.
Ah that's how it works! Many thanks Sam!
I've compiled and installed the redgreenbranch on LUMI-G and I ran the
ectrans-benchmark-gpu-dp
binary. This unfortunately resulted in the following error message:I'm clueless to what the problem may be, so I've also included my installation setup as a tar.gz for anyone to try:
ectrans_dwarf.tar.gz
Simply acquire an interactive LUMI-G compute node and execute
./install_redgreengpu.sh
. This will clone, build, and install all required sources. Then afterwards, go into a login node, andcd
into therun
directory. Thensbatch
therun_sbatch_lumi-g.sh
script to get the error output in theerr.<slurm_job_id>.0
file within theresults/sbatch/
folder.