bsc-wdc / compss

COMP Superscalar (COMPSs) is a framework which aims to ease the development and execution of applications for distributed infrastructures, such as Clusters, Grids and Clouds.
https://compss.bsc.es
Apache License 2.0
47 stars 20 forks source link

Importing Numpy Error with Hello World Script on MN5 #14

Closed manuel-g-castro closed 2 months ago

manuel-g-castro commented 5 months ago

Component

RUNTIME / PYTHON BINDING

Both because I am unsure.

Environment

Description

Upon testing the simple script provided to me by Jorge in MareNostrum 5, it fails due to some error upon importing numpy.

Minimal example to reproduce

Execute the following script

module load python/3.8.18
module load COMPSs/3.3

launch_compss \
    --sc_cfg=mn.cfg \
    --master_node="$SLURMD_NODENAME" \
    --worker_nodes="" \
    --worker_in_master_cpus=48 \
    --lang="python" \
    --pythonpath=$(pwd) \
    test.py

where test.py is

#!/usr/bin/python3

# -*- coding: utf-8 -*-
from pycompss.api.api import compss_wait_on
from pycompss.api.task import task
from pycompss.api.parameter import *

@task()
def hello(name):
    return "Hello " + name

if __name__ == '__main__':
   res = hello("world")
   res = compss_wait_on(res)
   print(res)

Exception

WARNING: Import ERROR importing Numpy
Traceback (most recent call last):
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 713, in <module>
    compss_main()
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 237, in compss_main
    compss_start(log_level, tracing, False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/api/api.py", line 121, in compss_start
    __start_runtime__(log_level, tracing, interactive, disable_external)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/binding.py", line 90, in start_runtime
    COMPSs.load_runtime(external_process=False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/COMPSs.py", line 82, in load_runtime
    self.compss = establish_link(_logger)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/link/direct.py", line 50, in establish_link
    import compss  # pylint: disable=import-outside-toplevel
ImportError: libhdf5.so.310: cannot open shared object file: No such file or directory
Error running application
Master execution failed. Exiting job.
kinow commented 5 months ago

I had a different result, @manuel-g-castro, using your test.py and the command you used on glogin1.

$ launch_compss     --sc_cfg=mn.cfg     --master_node="$SLURMD_NODENAME"     --worker_nodes=""     --worker_in_master_cpus=48     --lang="python"     --pythonpath=$(pwd)     test.py
Missing master node parameter

Port 43271 is already in use or time_wait, incrementing port by 1
Port 43272 is already in use or time_wait, incrementing port by 1
Port 43273 is already in use or time_wait, incrementing port by 1
srun: error: No account specified, please specify an account
srun: error: Unable to allocate resources: Unspecified error
------ Launching COMPSs application ------
No master to run...

I logged in and tried your commands to load the modules too.

$ module list

Currently Loaded Modules:
  1) intel/2023.2.0   4) ucx/1.15.0        7) python/3.8.18
  2) impi/2021.10.0   5) oneapi/2023.2.0   8) papi/7.1.0-gcc
  3) mkl/2023.2.0     6) bsc/1.0           9) COMPSs/3.3
jorgee commented 5 months ago

@kinow and @manuel-g-castro I suppose you are submitting the script that includes the launch_compss with sbatch. Due to the changes of MareNostrum5, in job submissions you have to add the account with the flag -A or --account.

jorgee commented 5 months ago

@manuel-g-castro could you try adding module load hdf5?. In principle it shouldn't be required but I think it was loaded during the compilation because it is needed for python/3.12.1 in MN5 and somethings has been linked to this library

manuel-g-castro commented 5 months ago

Hey, @jorgee, thank you for the fast answer. I am checking this issue after the long weekend.

There is a fundamental piece of information that I forgot to mention, my bad, is that I run all of this within and interactive session in Slurm.

So the first step to reproduce the error is to execute salloc -N 1 -t 30:00 --account=bsc32 --qos=gp_bsces.

I am now trying to execute importing hdf5 and it is still failing. My understanding is that I do not need to specify any account, since we are already allocated resources. Am I right?

The command that I am executing:

[bsc032371@gs08r2b69 compss-test]$ launch_compss     --sc_cfg=mn.cfg     --master_node="$SLURMD_NODENAME"     --worker_nodes=""     --worker_in_master_cpus=48     --lang="python"     --pythonpath=$(pwd) $(pwd)/test.py 
------ Launching COMPSs application ------
[ INFO ] Using default execution type: compss

----------------- Executing test.py --------------------------

Traceback (most recent call last):
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 713, in <module>
    compss_main()
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 237, in compss_main
    compss_start(log_level, tracing, False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/api/api.py", line 121, in compss_start
    __start_runtime__(log_level, tracing, interactive, disable_external)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/binding.py", line 90, in start_runtime
    COMPSs.load_runtime(external_process=False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/COMPSs.py", line 82, in load_runtime
    self.compss = establish_link(_logger)
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/link/direct.py", line 50, in establish_link
    import compss  # pylint: disable=import-outside-toplevel
    ^^^^^^^^^^^^^
ImportError: libcilkrts.so.5: cannot open shared object file: No such file or directory
Error running application
Master execution failed. Exiting job.

These are my loaded modules:

[bsc032371@gs08r2b69 compss-test]$ module list

Currently Loaded Modules:
  1) mkl/2023.2.0   3) oneapi/2023.2.0   5) intel/2024.1   7) hdf5/1.14.4.2    9) python/3.12.1   11) COMPSs/3.3
  2) ucx/1.15.0     4) bsc/1.0           6) impi/2021.12   8) sqlite3/3.45.2  10) papi/7.1.0-gcc
kinow commented 5 months ago

Logged in to glogin4, then tried the same you did, @manuel-g-castro , but loading impi/intel/and hdf5:

$ salloc -N 1 -t 30:00 --account=bsc32 --qos=gp_bsces
$ module load python/3.8.18
$ module load COMPSs/3.3
$ module load impi/2021.12 intel/2024.1 hdf5/1.14.4.2
$ file test.py
test.py: Python script, ASCII text executable
$ launch_compss \
    --sc_cfg=mn.cfg \
    --master_node="$SLURMD_NODENAME" \
    --worker_nodes="" \
    --worker_in_master_cpus=48 \
    --lang="python" \
    --pythonpath=$(pwd) \
    test.py
------ Launching COMPSs application ------
[ INFO ] Using default execution type: compss

----------------- Executing test.py --------------------------

WARNING: Import ERROR importing Numpy
Traceback (most recent call last):
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 713, in <module>
    compss_main()
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 237, in compss_main
    compss_start(log_level, tracing, False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/api/api.py", line 121, in compss_start
    __start_runtime__(log_level, tracing, interactive, disable_external)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/binding.py", line 90, in start_runtime
    COMPSs.load_runtime(external_process=False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/COMPSs.py", line 82, in load_runtime
    self.compss = establish_link(_logger)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/link/direct.py", line 50, in establish_link
    import compss  # pylint: disable=import-outside-toplevel
ImportError: libhdf5.so.310: cannot open shared object file: No such file or directory
Error running application
Master execution failed. Exiting job.

There might be some other combination of modules that work, but I guess @jorgee might know better which ones would have to be loaded for your script to work.

jorgee commented 5 months ago

It is also happening without COMPSs. Please tell support@bsc.es

[bsc019611@glogin2 ~]$ module load python/3.8.18 load PYTHON/3.8.18 (PATH, MANPATH, LD_LIBRARY_PATH, LIBRARY_PATH, PKG_CONFIG_PATH, C_INCLUDE_PATH, CPLUS_INCLUDE_PATH, PYTHONHOME, PYTHONPATH) [bsc019611@glogin2 ~]$ python3 Python 3.8.18 (default, Feb 7 2024, 09:13:21) [Clang 17.0.0 (icx 2024.0.0.20231017)] on linux Type "help", "copyright", "credits" or "license" for more information.

import numpy Traceback (most recent call last): File "/apps/GPP/PYTHON/3.8.18/INTEL/lib/python3.8/site-packages/numpy/core/init.py", line 23, in from . import multiarray File "/apps/GPP/PYTHON/3.8.18/INTEL/lib/python3.8/site-packages/numpy/core/multiarray.py", line 10, in from . import overrides File "/apps/GPP/PYTHON/3.8.18/INTEL/lib/python3.8/site-packages/numpy/core/overrides.py", line 6, in from numpy.core._multiarray_umath import ( ImportError: libhdf5.so.310: cannot open shared object file: No such file or directory

jorgee commented 5 months ago

Hey, @jorgee, thank you for the fast answer. I am checking this issue after the long weekend.

There is a fundamental piece of information that I forgot to mention, my bad, is that I run all of this within and interactive session in Slurm.

So the first step to reproduce the error is to execute salloc -N 1 -t 30:00 --account=bsc32 --qos=gp_bsces.

I am now trying to execute importing hdf5 and it is still failing. My understanding is that I do not need to specify any account, since we are already allocated resources. Am I right?

The command that I am executing:

[bsc032371@gs08r2b69 compss-test]$ launch_compss     --sc_cfg=mn.cfg     --master_node="$SLURMD_NODENAME"     --worker_nodes=""     --worker_in_master_cpus=48     --lang="python"     --pythonpath=$(pwd) $(pwd)/test.py 
------ Launching COMPSs application ------
[ INFO ] Using default execution type: compss

----------------- Executing test.py --------------------------

Traceback (most recent call last):
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 713, in <module>
    compss_main()
  File "/apps/GPP/COMPSs/3.3//Bindings/python/3/pycompss/runtime/launch.py", line 237, in compss_main
    compss_start(log_level, tracing, False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/api/api.py", line 121, in compss_start
    __start_runtime__(log_level, tracing, interactive, disable_external)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/binding.py", line 90, in start_runtime
    COMPSs.load_runtime(external_process=False)
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/COMPSs.py", line 82, in load_runtime
    self.compss = establish_link(_logger)
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/apps/GPP/COMPSs/3.3/Bindings/python/3/pycompss/runtime/management/link/direct.py", line 50, in establish_link
    import compss  # pylint: disable=import-outside-toplevel
    ^^^^^^^^^^^^^
ImportError: libcilkrts.so.5: cannot open shared object file: No such file or directory
Error running application
Master execution failed. Exiting job.

These are my loaded modules:

[bsc032371@gs08r2b69 compss-test]$ module list

Currently Loaded Modules:
  1) mkl/2023.2.0   3) oneapi/2023.2.0   5) intel/2024.1   7) hdf5/1.14.4.2    9) python/3.12.1   11) COMPSs/3.3
  2) ucx/1.15.0     4) bsc/1.0           6) impi/2021.12   8) sqlite3/3.45.2  10) papi/7.1.0-gcc

I think they have changed the default intel modules. Are you using COMPSS_PYTHON_VERSION? It will be great if you send this MareNosrtum related errors to support-compss@bsc.es, as they are more related to the installation in the supercomputer than errors in the code.

manuel-g-castro commented 2 months ago

Hello, @jorgee . Sorry for the late response. I was stuck on Ph.D bureaucracy for a while.

I read on your documentation that it should not be needed to set that flag if you load the python library before COMPSs.

Anyhow, I tried again and the issue persisted. Therefore, I contacted support and put you and Daniele on CC (I hope not to bother too much).

Thank you.

manuel-g-castro commented 2 months ago

Closing this issue because Javier Conejero has answered. I needed to import the hdf library.