Open fredt00 opened 2 months ago
Hi Fred,
There are a few things that jump out to me in your scripts:
1) neither of them load an MPI module, 2) the compile script loads the CUDA module but the run script doesn't, 3) you're starting Python using mpirun.
If MPI is available on the machine without a module and that's the one you want to use, then point 1. should be okay.
For 2., this could be causing you to use a different CUDA when compiling (the one from the module) than when running (some other version on the system), and that tends to cause problems. It's important to run in the same environment that you've compiled in.
I think 3. is the cause of the mpiexec does not support recursive calls
message. AMUSE uses MPI in a different way than most applications: instead of having many copies of your script running in parallel, there's only one copy, which will dynamically create parallel community code instances as needed within your allocation. So you should start your script without mpirun
, AMUSE will call it itself if needed (it has other ways of starting models too).
Oh, and about ./configure
failing to detect the CUDA libraries, that's an interesting one. I'm currently working on the build system, and I've rewritten the CUDA detection logic because CUDA has changed over time and it could use an update. I'm going to check that the new system works with this directory layout, and if it doesn't, fix it.
Thanks for reporting this even if you worked around it already, it's much better to fix things like this on the AMUSE side where we can fix it for everyone else too.
Thanks for the advice! In terms of your suggestions:
1) GCCcore/11.3.0 4) GCC/11.3.0 7) libxml2/2.9.13-GCCcore-11.3.0 10) OpenSSL/1.1 13) libfabric/1.15.1-GCCcore-11.3.0 16) OpenMPI/4.1.4-GCC-11.3.0 19) FFTW/3.3.10-GCC-11.3.0 22) ScaLAPACK/2.2.0-gompi-2022a-fb
2) zlib/1.2.12-GCCcore-11.3.0 5) numactl/2.0.14-GCCcore-11.3.0 8) libpciaccess/0.16-GCCcore-11.3.0 11) libevent/2.1.12-GCCcore-11.3.0 14) PMIx/4.1.2-GCCcore-11.3.0 17) OpenBLAS/0.3.20-GCC-11.3.0 20) gompi/2022a 23) foss/2022a
3) binutils/2.38-GCCcore-11.3.0 6) XZ/5.2.5-GCCcore-11.3.0 9) hwloc/2.7.1-GCCcore-11.3.0 12) UCX/1.12.1-GCCcore-11.3.0 15) UCC/1.0.0-GCCcore-11.3.0 18) FlexiBLAS/3.2.0-GCC-11.3.0 21) FFTW.MPI/3.3.10-gompi-2022a
2 and 3 are both good points. I've removed mpirun and loaded CUDA and now I just get the warning:
/home/oxfd1327/soft/amuse-gpu/amuse/src/amuse/rfi/core.py:964: UserWarning: MPI (unexpectedly?) not available, falling back to sockets channel
warnings.warn("MPI (unexpectedly?) not available, falling back to sockets channel")
And my code runs, although I don't see any speed up compared to when I configured without GPUs so I'm wondering if it is configured correctly. Do you know of a way to confirm the GPU utilisation? Running nvidia-smi before the python call shows that I am being allocated the requested GPUs but I can't see any information about their usage with seff
for example.
In my script petar is called with
self.bound=code(self.converter, mode='gpu',number_of_workers=code_number_of_workers)
Is this the correct way to get petar to use GPUs? I can't see any mention of GPUs in the petar interface files.
PeTar in AMUSE currently doesn’t use the GPU, this would require at least manually modifying the Makefile but probably more modifications.
Ah ok, that makes sense. I've been trying to see if FastKick will run on the GPUs but strangely I get this error every few bridge timesteps:
Traceback (most recent call last):
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/examples/fred/galaxy_cluster_master.py", line 353, in <module>
main(**o.__dict__)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/examples/fred/galaxy_cluster_master.py", line 268, in main
integrator.evolve_model(time)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/couple/bridge.py", line 598, in evolve_model
return self.evolve_joined_leapfrog(tend, timestep)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/couple/bridge.py", line 624, in evolve_joined_leapfrog
self.kick_codes(timestep / 2.0)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/couple/bridge.py", line 756, in kick_codes
de += x.kick(dt)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/couple/bridge.py", line 478, in kick
self.kick_with_field_code(
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/couple/bridge.py", line 516, in kick_with_field_code
ax,ay,az=field_code.get_gravity_at_point(
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/couple/bridge.py", line 146, in get_gravity_at_point
return code.get_gravity_at_point(radius, x, y, z)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/methods.py", line 168, in __call__
result = self.method(*list_arguments, **keyword_arguments)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/methods.py", line 166, in __call__
object = self.precall()
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/methods.py", line 215, in precall
return self.definition.precall(self)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/interface.py", line 373, in precall
transition.do()
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/state.py", line 123, in do
self.method.new_method()()
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/methods.py", line 168, in __call__
result = self.method(*list_arguments, **keyword_arguments)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/methods.py", line 168, in __call__
result = self.method(*list_arguments, **keyword_arguments)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/methods.py", line 168, in __call__
result = self.method(*list_arguments, **keyword_arguments)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/methods.py", line 170, in __call__
result = self.convert_result(result)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/methods.py", line 209, in convert_result
return self.definition.convert_result(self, result)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/interface.py", line 682, in convert_result
return self.handle_return_value(method, result)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/interface.py", line 614, in handle_as_unit
unit.append_result_value(method, self, value, result)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/interface.py", line 70, in append_result_value
self.convert_result_value(method, definition, value)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/interface.py", line 80, in convert_result_value
definition.handle_errorcode(errorcode)
File "/cosma/home/dp016/dc-thom14/soft/amuse-gpu/amuse/src/amuse/support/interface.py", line 586, in handle_errorcode
raise exceptions.AmuseException(
amuse.support.exceptions.AmuseException: Error when calling 'commit_particles' of a '<class 'amuse.community.fastkick.interface.FastKick'>', errorcode is -3
It seems to happen randomly but usually after the third bridge timestep Any Idea what could be causing this? Is it a GPU configuration problem? It doesn't seem to happen with mode='cpu'
Hi,
I'm trying to get amuse up and running with GPUs but haven't had any success. Specifically I want petar and fastkick to run on GPUs. I've been using this script to build amuse:
But it always fails at ./configure, complaining that
configure: error: cannot find cuda runtime libraries in /apps/system/easybuild/software/CUDA/11.7.0/lib /apps/system/easybuild/software/CUDA/11.7.0/lib64
.This slightly convoluted installation seems to be the only way to get MPI working correctly for the non GPU installation which only works at runtime if I use miniconda as above.
I tried running just
./configure
and then manually editing config.mk toAnd the GPU versions of the codes built successfully. However then running them with this script:
And I get the error:
Is there anything obviously wrong with this process? Any help would be greatly appreciated!
Cheers, Fred