DeiC-HPC / cotainr

cotainr - a user space Apptainer/Singularity container builder.
European Union Public License 1.2
17 stars 3 forks source link

ValueError: Invalid command cmd='conda env create -f /tmp/tmpnyhxbslm/singularity_sandbox/py311_rocm.yml -n #44

Open kaare-mikkelsen opened 9 months ago

kaare-mikkelsen commented 9 months ago

Using this yml file (previous version required python 3.11.5, hence the name):

py311_rocm.yml

name: py_rocm542_pytorch channels:

  • conda-forge dependencies:
  • mne
  • mne-bids
  • neptune-client
  • pytorch-lightning
  • matplotlib
  • numpy
  • scikit-learn
  • optuna
  • tabulate
  • pandas
  • pip
  • h5py
  • python
  • pip:

And this build command, on lumi-g:

cotainr build lumi_pytorch_rocm_demo.sif --system=lumi-g --conda-env py311_rocm.yml

I get the below error (sorry for the mixed formatting, don't know how to turn off markdown here):

>>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/conda/exceptions.py", line 1132, in __call__
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/conda_env/cli/main.py", line 78, in do_call
    exit_code = getattr(module, func_name)(args, parser)
  File "/opt/conda/lib/python3.10/site-packages/conda/notices/core.py", line 121, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/conda_env/cli/main_create.py", line 97, in execute
    spec = specs.detect(
  File "/opt/conda/lib/python3.10/site-packages/conda_env/specs/__init__.py", line 68, in detect
    if spec.can_handle():
  File "/opt/conda/lib/python3.10/site-packages/conda_env/specs/yaml_file.py", line 18, in can_handle
    self._environment = env.from_file(self.filename)
  File "/opt/conda/lib/python3.10/site-packages/conda_env/env.py", line 171, in from_file
    return from_yaml(yamlstr, filename=filename)
  File "/opt/conda/lib/python3.10/site-packages/conda_env/env.py", line 140, in from_yaml
    data = yaml_safe_load(yamlstr)
  File "/opt/conda/lib/python3.10/site-packages/conda/common/serialize.py", line 51, in yaml_safe_load

If submitted, this report will be used by core maintainers to improve return _yaml_safe().load(string) File "/opt/conda/lib/python3.10/site-packages/ruamel/yaml/main.py", line 426, in load future releases of conda. return constructor.get_single_data() File "/opt/conda/lib/python3.10/site-packages/ruamel/yaml/constructor.py", line 111, in get_single_data node = self.composer.get_single_node() File "/opt/conda/lib/python3.10/site-packages/ruamel/yaml/composer.py", line 73, in get_single_node if not self.parser.check_event(StreamEndEvent): File "/opt/conda/lib/python3.10/site-packages/ruamel/yaml/parser.py", line 139, in check_event self.current_event = self.state() File "/opt/conda/lib/python3.10/site-packages/ruamel/yaml/parser.py", line 209, in parse_document_start raise ParserError( ruamel.yaml.parser.ParserError: expected '', but found ('',) in "", line 2, column 1: channels: ^ (line: 2)

$ /opt/conda/bin/conda-env create -f /tmp/tmpnyhxbslm/singularity_sandbox/py311_rocm.yml -n conda_container_env

environment variables: CIO_TEST= CMAKE_PREFIX_PATH=/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray- python-3.9.12.1:/appl/lumi/SW/system/EB/lumi-tools/23.04 CONDA_AUTO_UPDATE_CONDA=false CONDA_EXE=/opt/conda/bin/conda CONDA_PYTHON_EXE=/opt/conda/bin/python CONDA_ROOT=/opt/conda CONDA_SHLVL=0 CRAYPAT_LD_LIBRARY_PATH=/opt/cray/pe/gcc-libs:/opt/cray/gcc- libs:/opt/cray/pe/perftools/22.12.0/lib64 CRAY_LD_LIBRARY_PATH=/opt/cray/pe/libsci/22.12.1.1/CRAY/9.0/x86_64/lib:/opt/cray/pe/mpich/8 .1.23/ofi/crayclang/10.0/lib:/opt/cray/pe/mpich/8.1.23/gtl/lib:/opt/cr ay/pe/dsmml/0.2.2/dsmml/lib:/opt/cray/pe/cce/15.0.0/cce-clang/x86_64/l ib:/opt/cray/pe/cce/15.0.0/cce/x86_64/lib:/opt/cray/pe/perftools/22.12 .0/lib64 CURL_CA_BUNDLE= LD_LIBRARY_PATH=/.singularity.d/libs LD_PRELOAD= LMOD_PACKAGE_PATH=/appl/lumi/LUMI-SoftwareStack/LMOD LUMI_VISIBILITYHOOKDATAPATH=/appl/lumi/mgmt/LMOD/VisibilityHookData/?.lua MANPATH=/opt/cray/pe/python/3.9.12.1/share/man:/opt/cray/libfabric/1.15.2.0/sh are/man:/appl/lumi/SW/system/EB/lumi-tools/23.04/share/man:/opt/cray/p e/libsci/22.12.1.1/man:/opt/cray/pe/man/csmlversion:/opt/cray/pe/mpich /8.1.23/ofi/man:/opt/cray/pe/mpich/8.1.23/man/mpich:/opt/cray/pe/dsmml /0.2.2/dsmml/man:/opt/cray/pe/craype/2.7.19/man:/opt/cray/pe/cce/15.0. 0/cce-clang/x86_64/share/man:/opt/cray/pe/cce/15.0.0/man:/opt/cray/pe/ perftools/22.12.0/man:/opt/cray/pe/papi/6.0.0.17/share/pdoc/man:/usr/s hare/lmod/lmod/share/man:/usr/local/man:/usr/share/man:/usr/man MODULEPATH=/opt/cray/pe/lmod/modulefiles/comnet/crayclang/14.0/ofi/1.0:/opt/cray/ pe/lmod/modulefiles/net/ofi/1.0:/opt/cray/pe/lmod/modulefiles/cpu/x86- rome/1.0:/users/mikkelse/EasyBuild/modules/LUMI/22.08/partition/L:/use rs/mikkelse/EasyBuild/modules/LUMI/22.08/partition/common:/appl/lumi/m odules/easybuild/LUMI/22.08/partition/L:/appl/lumi/modules/easybuild/L UMI/22.08/partition/common:/appl/lumi/modules/spack/LUMI/22.08/partiti on/L/cray- sles15-zen2:/appl/lumi/modules/spack/LUMI/22.08/partition/common/cray- sles15- zen2:/appl/lumi/modules/manual/LUMI/22.08/partition/L:/appl/lumi/modul es/manual/LUMI/22.08/partition/common:/appl/lumi/modules/easybuild/sys tem:/appl/lumi/modules/Infrastructure/LUMI/22.08/partition/L:/opt/cray /pe/lmod/modulefiles/core:/opt/cray/pe/lmod/modulefiles/craype-targets /default:/opt/cray/modulefiles:/opt/modulefiles:/appl/lumi/modules/Sys temPartition/LUMI/22.08:/opt/cray/pe/lmod/modulefiles/mpi/crayclang/14 .0/ofi/1.0/cray-mpich/8.0:/opt/cray/pe/lmod/modulefiles/compiler/crayc lang/14.0:/opt/cray/pe/lmod/modulefiles/mix_compilers:/opt/cray/pe/lmo d/modulefiles/perftools/22.12.0:/usr/share/modulefiles/Linux:/usr/shar e/modulefiles/Core:/usr/share/lmod/lmod/modulefiles/Core:/appl/lumi/mo dules/SoftwareStack:/appl/lumi/modules/StyleModifiers:/appl/lumi/modul es/init-LUMI-SoftwareStack NLSPATH=/opt/cray/pe/cce/15.0.0/cce/x86_64/share/nls/En/%N.cat PATH=/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin: /sbin:/bin:/opt/rocm/bin PE_CRAYCLANG_FIXED_PKGCONFIG_PATH=/opt/cray/pe/libsci/22.12.1.1/CRAY/9.0/x86_64/lib/pkgconfig:/opt/cray/ pe/mpich/8.1.23/ofi/crayclang/10.0/lib/pkgconfig PE_LIBSCI_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/libsci/22.12.1.1/@PRGENV@/@PE_LIBSCI_GENCOMPS@/@PE_LIBSCI _TARGET@/lib/pkgconfig PKG_CONFIG_PATH=/opt/cray/libfabric/1.15.2.0/lib64/pkgconfig:/opt/cray/pe/dsmml/0.2.2/ dsmml/lib/pkgconfig:/opt/cray/pe/craype/2.7.19/pkg-config PYTHONPATH=/opt/cray/pe/python/3.9.12.1 PYTHON_PATH=/opt/cray/pe/python/3.9.12.1 REQUESTS_CA_BUNDLE= SSL_CERT_FILE= USER_PATH=/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python- 3.9.12.1/bin:/opt/cray/pe/python/3.9.12.1/bin:/opt/cray/libfabric/1.15 .2.0/bin:/pfs/lustrep1/users/mikkelse/.vscode- server/bin/fdb98833154679dbaa7af67a5a29fe19e55c2b73/bin/remote- cli:/appl/lumi/SW/system/EB/lumi-tools/23.04/bin:/opt/cray/pe/mpich/8. 1.23/ofi/crayclang/10.0/bin:/opt/cray/pe/mpich/8.1.23/bin:/opt/cray/pe /craype/2.7.19/bin:/opt/cray/pe/cce/15.0.0/binutils/x86_64/x86_64-pc- linux-gnu/bin:/opt/cray/pe/cce/15.0.0/binutils/cross/x86_64- aarch64/aarch64-linux-gnu/../bin:/opt/cray/pe/cce/15.0.0/utils/x8664/ bin:/opt/cray/pe/cce/15.0.0/bin:/opt/cray/pe/cce/15.0.0/cce-clang/x86 64/bin:/opt/cray/pe/perftools/22.12.0/bin:/opt/cray/pe/papi/6.0.0.17/b in:/users/mikkelse/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mi t/bin:/opt/cray/pe/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/u sr/local/sbin XNLSPATH=/usr/X11R6/lib/X11/nls LMOD_REF_COUNT_CMAKE_PREFIX_PATH=/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray- python-3.9.12.1:1;/appl/lumi/SW/system/EB/lumi-tools/23.04:1 __LMOD_REF_COUNT_CRAY_LD_LIBRARY_PATH=/opt/cray/pe/libsci/22.12.1.1/CRAY/9.0/x86_64/lib:1;/opt/cray/pe/mpich /8.1.23/ofi/crayclang/10.0/lib:1;/opt/cray/pe/mpich/8.1.23/gtl/lib:1;/ opt/cray/pe/dsmml/0.2.2/dsmml/lib:1;/opt/cray/pe/cce/15.0.0/cce-clang/ x86_64/lib:1;/opt/cray/pe/cce/15.0.0/cce/x86_64/lib:1;/opt/cray/pe/per ftools/22.12.0/lib64:1 LMOD_REF_COUNT_LD_LIBRARY_PATH=/opt/cray/pe/python/3.9.12.1/lib:1;/opt/cray/pe/gcc-libs:1;/opt/cray/l ibfabric/1.15.2.0/lib64:1;/opt/cray/pe/papi/6.0.0.17/lib64:1 LMOD_REF_COUNT_MANPATH=/opt/cray/pe/python/3.9.12.1/share/man:1;/opt/cray/libfabric/1.15.2.0/ share/man:1;/appl/lumi/SW/system/EB/lumi-tools/23.04/share/man:1;/opt/ cray/pe/libsci/22.12.1.1/man:1;/opt/cray/pe/man/csmlversion:1;/opt/cra y/pe/mpich/8.1.23/ofi/man:1;/opt/cray/pe/mpich/8.1.23/man/mpich:1;/opt /cray/pe/dsmml/0.2.2/dsmml/man:1;/opt/cray/pe/craype/2.7.19/man:1;/opt /cray/pe/cce/15.0.0/cce-clang/x86_64/share/man:1;/opt/cray/pe/cce/15.0 .0/man:1;/opt/cray/pe/perftools/22.12.0/man:1;/opt/cray/pe/papi/6.0.0. 17/share/pdoc/man:1;/usr/share/lmod/lmod/share/man:1;/usr/local/man:1; /usr/share/man:1;/usr/man:1 LMOD_REF_COUNT_MODULEPATH=/opt/cray/pe/lmod/modulefiles/comnet/crayclang/14.0/ofi/1.0:1;/opt/cra y/pe/lmod/modulefiles/net/ofi/1.0:1;/opt/cray/pe/lmod/modulefiles/cpu/ x86- rome/1.0:1;/users/mikkelse/EasyBuild/modules/LUMI/22.08/partition/L:1; /users/mikkelse/EasyBuild/modules/LUMI/22.08/partition/common:1;/appl/ lumi/modules/easybuild/LUMI/22.08/partition/L:1;/appl/lumi/modules/eas ybuild/LUMI/22.08/partition/common:1;/appl/lumi/modules/spack/LUMI/22. 08/partition/L/cray-sles15- zen2:1;/appl/lumi/modules/spack/LUMI/22.08/partition/common/cray-sles1 5- zen2:1;/appl/lumi/modules/manual/LUMI/22.08/partition/L:1;/appl/lumi/m odules/manual/LUMI/22.08/partition/common:1;/appl/lumi/modules/easybui ld/system:2;/appl/lumi/modules/Infrastructure/LUMI/22.08/partition/L:1 ;/opt/cray/pe/lmod/modulefiles/core:2;/opt/cray/pe/lmod/modulefiles/cr aype-targets/default:2;/opt/cray/modulefiles:2;/opt/modulefiles:1;/app l/lumi/modules/SystemPartition/LUMI/22.08:1;/opt/cray/pe/lmod/modulefi les/mpi/crayclang/14.0/ofi/1.0/cray-mpich/8.0:1;/opt/cray/pe/lmod/modu lefiles/compiler/crayclang/14.0:1;/opt/cray/pe/lmod/modulefiles/mix_co mpilers:1;/opt/cray/pe/lmod/modulefiles/perftools/22.12.0:1;/usr/share /modulefiles/Linux:1;/usr/share/modulefiles/Core:1;/usr/share/lmod/lmo d/modulefiles/Core:1;/appl/lumi/modules/SoftwareStack:1;/appl/lumi/mod ules/StyleModifiers:1;/appl/lumi/modules/init-LUMI-SoftwareStack:1 LMOD_REF_COUNT_NLSPATH=/opt/cray/pe/cce/15.0.0/cce/x86_64/share/nls/En/%N.cat:1 LMOD_REF_COUNT_PATH=/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python- 3.9.12.1/bin:1;/opt/cray/pe/python/3.9.12.1/bin:1;/opt/cray/libfabric/ 1.15.2.0/bin:1;/pfs/lustrep1/users/mikkelse/.vscode- server/bin/fdb98833154679dbaa7af67a5a29fe19e55c2b73/bin/remote- cli:1;/appl/lumi/SW/system/EB/lumi-tools/23.04/bin:1;/opt/cray/pe/mpic h/8.1.23/ofi/crayclang/10.0/bin:1;/opt/cray/pe/mpich/8.1.23/bin:1;/opt /cray/pe/craype/2.7.19/bin:1;/opt/cray/pe/cce/15.0.0/binutils/x86_64/x 86_64-pc-linux-gnu/bin:1;/opt/cray/pe/cce/15.0.0/binutils/cross/x8664 -aarch64/aarch64-linux-gnu/../bin:1;/opt/cray/pe/cce/15.0.0/utils/x86 64/bin:1;/opt/cray/pe/cce/15.0.0/bin:1;/opt/cray/pe/cce/15.0.0/cce-cla ng/x86_64/bin:1;/opt/cray/pe/perftools/22.12.0/bin:1;/opt/cray/pe/papi /6.0.0.17/bin:1;/users/mikkelse/.local/bin:1;/usr/local/bin:1;/usr/bin :1;/bin:1;/usr/lib/mit/bin:1;/opt/cray/pe/bin:1 LMOD_REF_COUNT_PE_CRAYCLANG_FIXED_PKGCONFIG_PATH=/opt/cray/pe/libsci/22.12.1.1/CRAY/9.0/x86_64/lib/pkgconfig:1;/opt/cra y/pe/mpich/8.1.23/ofi/crayclang/10.0/lib/pkgconfig:1 LMOD_REF_COUNT_PKG_CONFIG_PATH=/opt/cray/libfabric/1.15.2.0/lib64/pkgconfig:1;/opt/cray/pe/dsmml/0.2. 2/dsmml/lib/pkgconfig:1;/opt/cray/pe/craype/2.7.19/pkg-config:1 __LMOD_REF_COUNT_PYTHONPATH=/opt/cray/pe/python/3.9.12.1:1 __LMOD_REF_COUNT_PYTHON_PATH=/opt/cray/pe/python/3.9.12.1:1

 active environment : None
        shell level : 0
   user config file : /users/mikkelse/.condarc

populated config files : /opt/conda/.condarc conda version : 23.3.1 conda-build version : not installed python version : 3.10.12.final.0 virtual packages : __archspec=1=x86_64 glibc=2.31=0 linux=5.14.21=0 __unix=0=0 base environment : /opt/conda (writable) conda av data dir : /opt/conda/etc/conda conda av metadata url : None channel URLs : https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch package cache : /opt/conda/pkgs /users/mikkelse/.conda/pkgs envs directories : /opt/conda/envs /users/mikkelse/.conda/envs platform : linux-64 user-agent : conda/23.3.1 requests/2.31.0 CPython/3.10.12 Linux/5.14.21-150400.24.46_12.0.73-cray_shasta_c ubuntu/20.04.5 glibc/2.31 UID:GID : 327000848:327000848 netrc file : None offline mode : False

An unexpected error has occurred. Conda has prepared the above report.

Timeout reached. No report sent.

Would you like conda to send this report to the core maintainers? [y/N]: Traceback (most recent call last): File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/container.py", line 168, in run_command_in_container process = self._subprocess_runner( File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/container.py", line 225, in _subprocess_runner return util.stream_subprocess(args=args, **kwargs) File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/util.py", line 113, in stream_subprocess completed_process.check_returncode() File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/subprocess.py", line 460, in check_returncode raise CalledProcessError(self.returncode, self.args, self.stdout, subprocess.CalledProcessError: Command '['singularity', 'exec', '--writable', '--no-home', '--no-umask', PosixPath('/tmp/tmpnyhxbslm/singularity_sandbox'), 'conda', 'env', 'create', '-f', '/tmp/tmpnyhxbslm/singularity_sandbox/py311_rocm.yml', '-n', 'conda_container_env']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/bin/cotainr", line 14, in sys.exit(main()) File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/cli.py", line 390, in main cli.subcommand.execute() File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/cli.py", line 148, in execute conda_install.add_environment(path=conda_env_file, name=conda_env_name) File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/pack.py", line 76, in add_environment self.sandbox.run_command_in_container( File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/container.py", line 183, in run_command_in_container raise ValueError( ValueError: Invalid command cmd='conda env create -f /tmp/tmpnyhxbslm/singularity_sandbox/py311_rocm.yml -n conda_container_env' passed to Singularity resulted in the FATAL error:

kaare-mikkelsen commented 8 months ago

from playing around with different yml files, it seems like the error is related to the rocm-version. at least I seem to get working builds again when I downgrade to rocm5.2 and torch 1.13

Chroxvi commented 8 months ago

Thanks a lot for reporting this issue.

As you have already discovered, it appears to be an issue with the conda environment specification in the yml file.

I think cotainr needs to provide a more clear error message for cases like this. I believe we should be able to provide much improved error messages once we have the work in the output formatting branch merged.

eskech commented 8 months ago

@Chroxvi can we close this one with the output merged or do you still need to provide some changes?

Chroxvi commented 8 months ago

I think we still need to add logging of exceptions in a few critical places. I have a backlog item for this.

eskech commented 8 months ago

Will a quick fix for now until you have the time to add this in a FAQ section in the documentation?