Open fwyzard opened 2 months ago
assign core,heterogeneous
New categories assigned: core,heterogeneous
@Dr15Jones,@fwyzard,@makortel,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks
cms-bot internal usage
A new Issue was created by @fwyzard.
@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
@fwyzard , yes we should be able to implement this via scram b ....
. How about , for local development (e.g. where user only wants to test things on local host) we can just use scram build enable-alpaka-native
which on host with
cudaComputeCapabilities
to get the get actual gpu and only build for that gpu typerocmComputeCapabilities
to get the get actual gpu and only build for that gpu typewe can also add scram b {enable|disable}-alpaka-{rocm|cuda}
for expicitly enable/disable rocm/cuda backend build
If needed, we can discuss this in core sw meeting tomorrow
If needed, we can discuss this in core sw meeting tomorrow
Sounds good.
About updating the flags in cuda.xml
and rocm.xml
tools.
cuda.xml
The syntax for enabling sm_##
is -gencode arch=compute_##,code=[sm_##,compute_##]
.
So, calling e.g.
scram b enable-backend cuda=sm_89
should remove all the CUDA_FLAGS
of the form -gencode arch=compute_[0-9]+,code=[sm_[0-9]+,compute_[0-9]+]
, and add -gencode arch=compute_89,code=[sm_89,compute_89]
.
The "native" CUDA architectures used by the NVIDIA GPUs in the local machine can be extracted from cudaComputeCapabilities
:
$ cudaComputeCapabilities
0 8.9 NVIDIA L4
1 7.5 Tesla T4
should use the architecture sm_75
.
Currently there is a script cmsCudaSetup.sh
, that does part of what scram b enable-backend cuda=native
should do.
rocm.xml
The syntax for enabling gfx####
is --offload-arch=gfx####
, so
scram b enable-backend rocm=gfx1100
should remove all the ROCM_FLAGS
of the form --offload-arch=gfx[0-9a-f]+
, and add --offload-arch=gfx1100
.
Note that the value after gfx
can have 3 or 4 hexadecimal digits.
The "native" ROCm architectures used by the AMD GPUs in the local machine can be extracted from rocmComputeCapabilities
:
$ rocmComputeCapabilities
0 gfx1100 AMD Radeon Pro W7800 (unsupported)
@fwyzard , thanks for the hints in https://github.com/cms-sw/cmssw/issues/45859#issuecomment-2327024351.
As scram build ...
passes every thing to gmake as build targets so it is not easy to implement scram build enable-backend cuda
as in this case cuda
becomes a build target and gmake will try to run it OR scram build enable-backend cuda=sm_89
in this case cuda
becomes a variable overriding its value set by cuda tool. Instead how about
scram build {en,dis}able-backend-{cuda,rocm}
: To enable/disable cuda/rocm alpaka backendsscram build enable-backend-{cuda,rocm}-[comma-separated-compute-capabilities]
e.g
scram build enable-backend-cuda-sm_75
or scram build enable-backend-cuda-sm_75,sm_89
scram build enable-backend-rocm-gfx1100
or scram build enable-backend-rocm-gfx1100,gfx90a
scram build enable-backend-cuda-native
: To find the native compute capabilities and use thosescram build enable-backend-cuda-reset
: To reset the compute capabilities to their original value ( from the release area)scram build enable-backend-native
: To disable the backend not available and call enable-backend-cuda-native
for the backend which is availableI see.
Maybe we could shorten the commands, like
scram build {en,dis}able-{cuda,rocm}
scram build enable-cuda-sm_75
scram build enable-rocm-gfx1100,gfx90a
etc?
And it might be more clear if we split the backend and individual targets with a :
scram build enable-cuda:sm_75
scram build enable-rocm:gfx1100,gfx90a
(I would suggest using =
but Make would interpret it as setting a variable)
What do you think ?
sounds good, so I will drop -backend
from the target and use :
for the compute capabilities
@fwyzard , for now I have enable-alpaka:native
to automatically enable/disable cuda/rocm backend and set the native compute capabilities. Is this a good target name of should I change it to enable-alpaka-native
( enable-native
sounds very generic )
Maybe enable-gpus:native
?
But it affects only Alpaka modules, not other modules that may use the process.options.accelerators
, right ?
Then enable-alpaka:native
may be more correct.
yes it only afftects the alpaka modules. OK so I will go with enable-alpaka:native
then
@fwyzard , {en,dis}able-{cuda,rocm}
also affect alpaka
only, should we change these to {en,dis}able-alpaka:{cuda,rocm}
?
I'm undecided, because then calls like scram b enable-alpaka:cuda:sm_75
starts to become complicated.
So I'm leaning more towards scram b enable-gpus:native
.
Could you implement that, and later today we ask @makortel his opinion ?
As enable-{cuda,rocm}:capabilities
only affects cuda/rocm directly so those call can remain enable-{cuda,rocm}:capability
.
What about disable-cuda
?
currently disable-cuda
only disables the alpaka-cuda backend. It does not disable the cuda
build rules so scram will still compile .cu
files for non-alpaka packages
But if we want disable-cuda
to disable both alpaka-cuda backend and also stop building .cu
files then I can do it but I think for now that will break builds ( there are packages which has gpu code depenency)
OK, let me try to summarise:
scram b disable-cuda
.cu
filesscram b disable-rocm
:
.hip.cc
filesscram b enable-cuda
:
.cu
filesscram b enable-cuda:sm_90
:
cuda.xml
tool file to support (only) the sm_90
architecture.cu
filesscram b enable-cuda:native
:
cudaComputeCapabilities
to determine the architecture of the NVIDIA GPUs in the systemcuda.xml
tool file to support (only) these architectures.cu
filesscram b enable-rocm
, enable-rocm:gfx1100
, enable-rocm:native
:
.hip.cc
files, and ROCm alpaka backendscram b enable-alpaka:native
.cu
and .hip.cc
files.Is it correct ?
Basically, it would never affect whether the regular .cu
and .hip.cc
files are built (other than which architecture is built), only whether the alpaka backends are built or not.
So I think I would prefer scram b enable-gpus:native
:-)
And, once https://github.com/cms-sw/cmssw/issues/45844 is complete, we could revisit this
currently
disable-cuda
only disables the alpaka-cuda backend. It does not disable thecuda
build rules so scram will still compile.cu
files for non-alpaka packages
and try to disable the CUDA or ROCm backends completely.
Is it correct ?
yes this is correct.
So I think I would prefer
scram b enable-gpus:native
OK
https://github.com/cms-sw/cmssw-config/pull/110 should implement these new rules. scram build help
in dev area should show these new build rules
I'd find it clearest if the {enable,disable}-{cuda,rocm}
and enable-gpus:native
would apply equally to the compilation of .cu
and .hip.cc
files as well. But to be practical I'm ok with leaving that to the time #45844 becomes complete.
The ROCm (and to some extend CUDA) alpaka backends add a noticeable amount to the time it takes to build some packages.
For users that do not care about running on (AMD) GPUs, we could speed up the compilation process disabling the ROCm (or CUDA) alpaka backend(s).
Also note that it could be much worse if we manage to add the SYCL/oneAPI backend...
This could be implemented in
scram
, with a syntax like?
An other way to speed up the compilation would be to target only one actual GPU type, like an NVIDIA T4 or an AMD Mi250.
This could be implemented with a syntax like
We could also get the hardware type from
cudaComputeCapabilities
orrocmComputeCapabilities
with a syntax like@smuzaffar do you think this could be implemented in scram ?
If you think so, we can discuss the implementation detail here or in person.