Closed manuel-g-castro closed 2 months ago
I had a different result, @manuel-g-castro, using your test.py
and the command you used on glogin1
.
$ launch_compss --sc_cfg=mn.cfg --master_node="$SLURMD_NODENAME" --worker_nodes="" --worker_in_master_cpus=48 --lang="python" --pythonpath=$(pwd) test.py
Missing master node parameter
Port 43271 is already in use or time_wait, incrementing port by 1
Port 43272 is already in use or time_wait, incrementing port by 1
Port 43273 is already in use or time_wait, incrementing port by 1
srun: error: No account specified, please specify an account
srun: error: Unable to allocate resources: Unspecified error
------ Launching COMPSs application ------
No master to run...
I logged in and tried your commands to load the modules too.
$ module list
Currently Loaded Modules:
1) intel/2023.2.0 4) ucx/1.15.0 7) python/3.8.18
2) impi/2021.10.0 5) oneapi/2023.2.0 8) papi/7.1.0-gcc
3) mkl/2023.2.0 6) bsc/1.0 9) COMPSs/3.3
@kinow and @manuel-g-castro I suppose you are submitting the script that includes the launch_compss with sbatch. Due to the changes of MareNostrum5, in job submissions you have to add the account with the flag -A or --account.
@manuel-g-castro could you try adding module load hdf5?. In principle it shouldn't be required but I think it was loaded during the compilation because it is needed for python/3.12.1 in MN5 and somethings has been linked to this library
Hey, @jorgee, thank you for the fast answer. I am checking this issue after the long weekend.
There is a fundamental piece of information that I forgot to mention, my bad, is that I run all of this within and interactive session in Slurm.
So the first step to reproduce the error is to execute salloc -N 1 -t 30:00 --account=bsc32 --qos=gp_bsces
.
I am now trying to execute importing hdf5
and it is still failing. My understanding is that I do not need to specify any account, since we are already allocated resources. Am I right?
The command that I am executing:
[bsc032371@gs08r2b69 compss-test]$ launch_compss --sc_cfg=mn.cfg --master_node="$SLURMD_NODENAME" --worker_nodes="" --worker_in_master_cpus=48 --lang="python" --pythonpath=$(pwd) $(pwd)/test.py
------ Launching COMPSs application ------
[ INFO ] Using default execution type: compss
----------------- Executing test.py --------------------------
Traceback (most recent call last):
File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 713, in <module>
compss_main()
File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 237, in compss_main
compss_start(log_level, tracing, False)
File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/api/api.py", line 121, in compss_start
__start_runtime__(log_level, tracing, interactive, disable_external)
File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/binding.py", line 90, in start_runtime
COMPSs.load_runtime(external_process=False)
File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/COMPSs.py", line 82, in load_runtime
self.compss = establish_link(_logger)
^^^^^^^^^^^^^^^^^^^^^^^
File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/link/direct.py", line 50, in establish_link
import compss # pylint: disable=import-outside-toplevel
^^^^^^^^^^^^^
ImportError: libcilkrts.so.5: cannot open shared object file: No such file or directory
Error running application
Master execution failed. Exiting job.
These are my loaded modules:
[bsc032371@gs08r2b69 compss-test]$ module list
Currently Loaded Modules:
1) mkl/2023.2.0 3) oneapi/2023.2.0 5) intel/2024.1 7) hdf5/1.14.4.2 9) python/3.12.1 11) COMPSs/3.3
2) ucx/1.15.0 4) bsc/1.0 6) impi/2021.12 8) sqlite3/3.45.2 10) papi/7.1.0-gcc
Logged in to glogin4
, then tried the same you did, @manuel-g-castro , but loading impi/intel/and hdf5:
$ salloc -N 1 -t 30:00 --account=bsc32 --qos=gp_bsces
$ module load python/3.8.18
$ module load COMPSs/3.3
$ module load impi/2021.12 intel/2024.1 hdf5/1.14.4.2
$ file test.py
test.py: Python script, ASCII text executable
$ launch_compss \
--sc_cfg=mn.cfg \
--master_node="$SLURMD_NODENAME" \
--worker_nodes="" \
--worker_in_master_cpus=48 \
--lang="python" \
--pythonpath=$(pwd) \
test.py
------ Launching COMPSs application ------
[ INFO ] Using default execution type: compss
----------------- Executing test.py --------------------------
WARNING: Import ERROR importing Numpy
Traceback (most recent call last):
File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 713, in <module>
compss_main()
File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 237, in compss_main
compss_start(log_level, tracing, False)
File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/api/api.py", line 121, in compss_start
__start_runtime__(log_level, tracing, interactive, disable_external)
File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/binding.py", line 90, in start_runtime
COMPSs.load_runtime(external_process=False)
File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/COMPSs.py", line 82, in load_runtime
self.compss = establish_link(_logger)
File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/link/direct.py", line 50, in establish_link
import compss # pylint: disable=import-outside-toplevel
ImportError: libhdf5.so.310: cannot open shared object file: No such file or directory
Error running application
Master execution failed. Exiting job.
There might be some other combination of modules that work, but I guess @jorgee might know better which ones would have to be loaded for your script to work.
It is also happening without COMPSs. Please tell support@bsc.es
[bsc019611@glogin2 ~]$ module load python/3.8.18 load PYTHON/3.8.18 (PATH, MANPATH, LD_LIBRARY_PATH, LIBRARY_PATH, PKG_CONFIG_PATH, C_INCLUDE_PATH, CPLUS_INCLUDE_PATH, PYTHONHOME, PYTHONPATH) [bsc019611@glogin2 ~]$ python3 Python 3.8.18 (default, Feb 7 2024, 09:13:21) [Clang 17.0.0 (icx 2024.0.0.20231017)] on linux Type "help", "copyright", "credits" or "license" for more information.
import numpy Traceback (most recent call last): File "/apps/GPP/PYTHON/3.8.18/INTEL/lib/python3.8/site-packages/numpy/core/init.py", line 23, in
from . import multiarray File "/apps/GPP/PYTHON/3.8.18/INTEL/lib/python3.8/site-packages/numpy/core/multiarray.py", line 10, in from . import overrides File "/apps/GPP/PYTHON/3.8.18/INTEL/lib/python3.8/site-packages/numpy/core/overrides.py", line 6, in from numpy.core._multiarray_umath import ( ImportError: libhdf5.so.310: cannot open shared object file: No such file or directory
Hey, @jorgee, thank you for the fast answer. I am checking this issue after the long weekend.
There is a fundamental piece of information that I forgot to mention, my bad, is that I run all of this within and interactive session in Slurm.
So the first step to reproduce the error is to execute
salloc -N 1 -t 30:00 --account=bsc32 --qos=gp_bsces
.I am now trying to execute importing
hdf5
and it is still failing. My understanding is that I do not need to specify any account, since we are already allocated resources. Am I right?The command that I am executing:
[bsc032371@gs08r2b69 compss-test]$ launch_compss --sc_cfg=mn.cfg --master_node="$SLURMD_NODENAME" --worker_nodes="" --worker_in_master_cpus=48 --lang="python" --pythonpath=$(pwd) $(pwd)/test.py ------ Launching COMPSs application ------ [ INFO ] Using default execution type: compss ----------------- Executing test.py -------------------------- Traceback (most recent call last): File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 713, in <module> compss_main() File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 237, in compss_main compss_start(log_level, tracing, False) File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/api/api.py", line 121, in compss_start __start_runtime__(log_level, tracing, interactive, disable_external) File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/binding.py", line 90, in start_runtime COMPSs.load_runtime(external_process=False) File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/COMPSs.py", line 82, in load_runtime self.compss = establish_link(_logger) ^^^^^^^^^^^^^^^^^^^^^^^ File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/link/direct.py", line 50, in establish_link import compss # pylint: disable=import-outside-toplevel ^^^^^^^^^^^^^ ImportError: libcilkrts.so.5: cannot open shared object file: No such file or directory Error running application Master execution failed. Exiting job.
These are my loaded modules:
[bsc032371@gs08r2b69 compss-test]$ module list Currently Loaded Modules: 1) mkl/2023.2.0 3) oneapi/2023.2.0 5) intel/2024.1 7) hdf5/1.14.4.2 9) python/3.12.1 11) COMPSs/3.3 2) ucx/1.15.0 4) bsc/1.0 6) impi/2021.12 8) sqlite3/3.45.2 10) papi/7.1.0-gcc
I think they have changed the default intel modules. Are you using COMPSS_PYTHON_VERSION? It will be great if you send this MareNosrtum related errors to support-compss@bsc.es, as they are more related to the installation in the supercomputer than errors in the code.
Hello, @jorgee . Sorry for the late response. I was stuck on Ph.D bureaucracy for a while.
I read on your documentation that it should not be needed to set that flag if you load the python library before COMPSs.
Anyhow, I tried again and the issue persisted. Therefore, I contacted support and put you and Daniele on CC (I hope not to bother too much).
Thank you.
Closing this issue because Javier Conejero has answered. I needed to import the hdf library.
Component
RUNTIME / PYTHON BINDING
Both because I am unsure.
Environment
Description
Upon testing the simple script provided to me by Jorge in MareNostrum 5, it fails due to some error upon importing numpy.
Minimal example to reproduce
Execute the following script
where
test.py
isException