Open kaare-mikkelsen opened 1 year ago
from playing around with different yml files, it seems like the error is related to the rocm-version. at least I seem to get working builds again when I downgrade to rocm5.2 and torch 1.13
Thanks a lot for reporting this issue.
As you have already discovered, it appears to be an issue with the conda environment specification in the yml file.
I think cotainr
needs to provide a more clear error message for cases like this. I believe we should be able to provide much improved error messages once we have the work in the output formatting branch merged.
@Chroxvi can we close this one with the output merged or do you still need to provide some changes?
I think we still need to add logging of exceptions in a few critical places. I have a backlog item for this.
Will a quick fix for now until you have the time to add this in a FAQ section in the documentation?
Using this yml file (previous version required python 3.11.5, hence the name):
py311_rocm.yml
And this build command, on lumi-g:
I get the below error (sorry for the mixed formatting, don't know how to turn off markdown here):
>>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
If submitted, this report will be used by core maintainers to improve return _yaml_safe().load(string) File "/opt/conda/lib/python3.10/site-packages/ruamel/yaml/main.py", line 426, in load future releases of conda. return constructor.get_single_data() File "/opt/conda/lib/python3.10/site-packages/ruamel/yaml/constructor.py", line 111, in get_single_data node = self.composer.get_single_node() File "/opt/conda/lib/python3.10/site-packages/ruamel/yaml/composer.py", line 73, in get_single_node if not self.parser.check_event(StreamEndEvent): File "/opt/conda/lib/python3.10/site-packages/ruamel/yaml/parser.py", line 139, in check_event self.current_event = self.state() File "/opt/conda/lib/python3.10/site-packages/ruamel/yaml/parser.py", line 209, in parse_document_start raise ParserError( ruamel.yaml.parser.ParserError: expected '', but found ('',)
in "", line 2, column 1:
channels:
^ (line: 2)
$ /opt/conda/bin/conda-env create -f /tmp/tmpnyhxbslm/singularity_sandbox/py311_rocm.yml -n conda_container_env
environment variables: CIO_TEST=
CMAKE_PREFIX_PATH=/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-
python-3.9.12.1:/appl/lumi/SW/system/EB/lumi-tools/23.04
CONDA_AUTO_UPDATE_CONDA=false
CONDA_EXE=/opt/conda/bin/conda
CONDA_PYTHON_EXE=/opt/conda/bin/python
CONDA_ROOT=/opt/conda
CONDA_SHLVL=0
CRAYPAT_LD_LIBRARY_PATH=/opt/cray/pe/gcc-libs:/opt/cray/gcc-
libs:/opt/cray/pe/perftools/22.12.0/lib64
CRAY_LD_LIBRARY_PATH=/opt/cray/pe/libsci/22.12.1.1/CRAY/9.0/x86_64/lib:/opt/cray/pe/mpich/8
.1.23/ofi/crayclang/10.0/lib:/opt/cray/pe/mpich/8.1.23/gtl/lib:/opt/cr
ay/pe/dsmml/0.2.2/dsmml/lib:/opt/cray/pe/cce/15.0.0/cce-clang/x86_64/l
ib:/opt/cray/pe/cce/15.0.0/cce/x86_64/lib:/opt/cray/pe/perftools/22.12
.0/lib64
CURL_CA_BUNDLE=
LD_LIBRARY_PATH=/.singularity.d/libs
LD_PRELOAD=
LMOD_PACKAGE_PATH=/appl/lumi/LUMI-SoftwareStack/LMOD
LUMI_VISIBILITYHOOKDATAPATH=/appl/lumi/mgmt/LMOD/VisibilityHookData/?.lua
MANPATH=/opt/cray/pe/python/3.9.12.1/share/man:/opt/cray/libfabric/1.15.2.0/sh
are/man:/appl/lumi/SW/system/EB/lumi-tools/23.04/share/man:/opt/cray/p
e/libsci/22.12.1.1/man:/opt/cray/pe/man/csmlversion:/opt/cray/pe/mpich
/8.1.23/ofi/man:/opt/cray/pe/mpich/8.1.23/man/mpich:/opt/cray/pe/dsmml
/0.2.2/dsmml/man:/opt/cray/pe/craype/2.7.19/man:/opt/cray/pe/cce/15.0.
0/cce-clang/x86_64/share/man:/opt/cray/pe/cce/15.0.0/man:/opt/cray/pe/
perftools/22.12.0/man:/opt/cray/pe/papi/6.0.0.17/share/pdoc/man:/usr/s
hare/lmod/lmod/share/man:/usr/local/man:/usr/share/man:/usr/man
MODULEPATH=/opt/cray/pe/lmod/modulefiles/comnet/crayclang/14.0/ofi/1.0:/opt/cray/
pe/lmod/modulefiles/net/ofi/1.0:/opt/cray/pe/lmod/modulefiles/cpu/x86-
rome/1.0:/users/mikkelse/EasyBuild/modules/LUMI/22.08/partition/L:/use
rs/mikkelse/EasyBuild/modules/LUMI/22.08/partition/common:/appl/lumi/m
odules/easybuild/LUMI/22.08/partition/L:/appl/lumi/modules/easybuild/L
UMI/22.08/partition/common:/appl/lumi/modules/spack/LUMI/22.08/partiti
on/L/cray-
sles15-zen2:/appl/lumi/modules/spack/LUMI/22.08/partition/common/cray-
sles15-
zen2:/appl/lumi/modules/manual/LUMI/22.08/partition/L:/appl/lumi/modul
es/manual/LUMI/22.08/partition/common:/appl/lumi/modules/easybuild/sys
tem:/appl/lumi/modules/Infrastructure/LUMI/22.08/partition/L:/opt/cray
/pe/lmod/modulefiles/core:/opt/cray/pe/lmod/modulefiles/craype-targets
/default:/opt/cray/modulefiles:/opt/modulefiles:/appl/lumi/modules/Sys
temPartition/LUMI/22.08:/opt/cray/pe/lmod/modulefiles/mpi/crayclang/14
.0/ofi/1.0/cray-mpich/8.0:/opt/cray/pe/lmod/modulefiles/compiler/crayc
lang/14.0:/opt/cray/pe/lmod/modulefiles/mix_compilers:/opt/cray/pe/lmo
d/modulefiles/perftools/22.12.0:/usr/share/modulefiles/Linux:/usr/shar
e/modulefiles/Core:/usr/share/lmod/lmod/modulefiles/Core:/appl/lumi/mo
dules/SoftwareStack:/appl/lumi/modules/StyleModifiers:/appl/lumi/modul
es/init-LUMI-SoftwareStack
NLSPATH=/opt/cray/pe/cce/15.0.0/cce/x86_64/share/nls/En/%N.cat
PATH=/opt/conda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:
/sbin:/bin:/opt/rocm/bin
PE_CRAYCLANG_FIXED_PKGCONFIG_PATH=/opt/cray/pe/libsci/22.12.1.1/CRAY/9.0/x86_64/lib/pkgconfig:/opt/cray/
pe/mpich/8.1.23/ofi/crayclang/10.0/lib/pkgconfig
PE_LIBSCI_VOLATILE_PKGCONFIG_PATH=/opt/cray/pe/libsci/22.12.1.1/@PRGENV@/@PE_LIBSCI_GENCOMPS@/@PE_LIBSCI
_TARGET@/lib/pkgconfig
PKG_CONFIG_PATH=/opt/cray/libfabric/1.15.2.0/lib64/pkgconfig:/opt/cray/pe/dsmml/0.2.2/
dsmml/lib/pkgconfig:/opt/cray/pe/craype/2.7.19/pkg-config
PYTHONPATH=/opt/cray/pe/python/3.9.12.1
PYTHON_PATH=/opt/cray/pe/python/3.9.12.1
REQUESTS_CA_BUNDLE=
SSL_CERT_FILE=
USER_PATH=/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-
3.9.12.1/bin:/opt/cray/pe/python/3.9.12.1/bin:/opt/cray/libfabric/1.15
.2.0/bin:/pfs/lustrep1/users/mikkelse/.vscode-
server/bin/fdb98833154679dbaa7af67a5a29fe19e55c2b73/bin/remote-
cli:/appl/lumi/SW/system/EB/lumi-tools/23.04/bin:/opt/cray/pe/mpich/8.
1.23/ofi/crayclang/10.0/bin:/opt/cray/pe/mpich/8.1.23/bin:/opt/cray/pe
/craype/2.7.19/bin:/opt/cray/pe/cce/15.0.0/binutils/x86_64/x86_64-pc-
linux-gnu/bin:/opt/cray/pe/cce/15.0.0/binutils/cross/x86_64-
aarch64/aarch64-linux-gnu/../bin:/opt/cray/pe/cce/15.0.0/utils/x8664/
bin:/opt/cray/pe/cce/15.0.0/bin:/opt/cray/pe/cce/15.0.0/cce-clang/x86
64/bin:/opt/cray/pe/perftools/22.12.0/bin:/opt/cray/pe/papi/6.0.0.17/b
in:/users/mikkelse/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/lib/mi
t/bin:/opt/cray/pe/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/u
sr/local/sbin
XNLSPATH=/usr/X11R6/lib/X11/nls
LMOD_REF_COUNT_CMAKE_PREFIX_PATH=/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-
python-3.9.12.1:1;/appl/lumi/SW/system/EB/lumi-tools/23.04:1
__LMOD_REF_COUNT_CRAY_LD_LIBRARY_PATH=/opt/cray/pe/libsci/22.12.1.1/CRAY/9.0/x86_64/lib:1;/opt/cray/pe/mpich
/8.1.23/ofi/crayclang/10.0/lib:1;/opt/cray/pe/mpich/8.1.23/gtl/lib:1;/
opt/cray/pe/dsmml/0.2.2/dsmml/lib:1;/opt/cray/pe/cce/15.0.0/cce-clang/
x86_64/lib:1;/opt/cray/pe/cce/15.0.0/cce/x86_64/lib:1;/opt/cray/pe/per
ftools/22.12.0/lib64:1
LMOD_REF_COUNT_LD_LIBRARY_PATH=/opt/cray/pe/python/3.9.12.1/lib:1;/opt/cray/pe/gcc-libs:1;/opt/cray/l
ibfabric/1.15.2.0/lib64:1;/opt/cray/pe/papi/6.0.0.17/lib64:1
LMOD_REF_COUNT_MANPATH=/opt/cray/pe/python/3.9.12.1/share/man:1;/opt/cray/libfabric/1.15.2.0/
share/man:1;/appl/lumi/SW/system/EB/lumi-tools/23.04/share/man:1;/opt/
cray/pe/libsci/22.12.1.1/man:1;/opt/cray/pe/man/csmlversion:1;/opt/cra
y/pe/mpich/8.1.23/ofi/man:1;/opt/cray/pe/mpich/8.1.23/man/mpich:1;/opt
/cray/pe/dsmml/0.2.2/dsmml/man:1;/opt/cray/pe/craype/2.7.19/man:1;/opt
/cray/pe/cce/15.0.0/cce-clang/x86_64/share/man:1;/opt/cray/pe/cce/15.0
.0/man:1;/opt/cray/pe/perftools/22.12.0/man:1;/opt/cray/pe/papi/6.0.0.
17/share/pdoc/man:1;/usr/share/lmod/lmod/share/man:1;/usr/local/man:1;
/usr/share/man:1;/usr/man:1
LMOD_REF_COUNT_MODULEPATH=/opt/cray/pe/lmod/modulefiles/comnet/crayclang/14.0/ofi/1.0:1;/opt/cra
y/pe/lmod/modulefiles/net/ofi/1.0:1;/opt/cray/pe/lmod/modulefiles/cpu/
x86-
rome/1.0:1;/users/mikkelse/EasyBuild/modules/LUMI/22.08/partition/L:1;
/users/mikkelse/EasyBuild/modules/LUMI/22.08/partition/common:1;/appl/
lumi/modules/easybuild/LUMI/22.08/partition/L:1;/appl/lumi/modules/eas
ybuild/LUMI/22.08/partition/common:1;/appl/lumi/modules/spack/LUMI/22.
08/partition/L/cray-sles15-
zen2:1;/appl/lumi/modules/spack/LUMI/22.08/partition/common/cray-sles1
5-
zen2:1;/appl/lumi/modules/manual/LUMI/22.08/partition/L:1;/appl/lumi/m
odules/manual/LUMI/22.08/partition/common:1;/appl/lumi/modules/easybui
ld/system:2;/appl/lumi/modules/Infrastructure/LUMI/22.08/partition/L:1
;/opt/cray/pe/lmod/modulefiles/core:2;/opt/cray/pe/lmod/modulefiles/cr
aype-targets/default:2;/opt/cray/modulefiles:2;/opt/modulefiles:1;/app
l/lumi/modules/SystemPartition/LUMI/22.08:1;/opt/cray/pe/lmod/modulefi
les/mpi/crayclang/14.0/ofi/1.0/cray-mpich/8.0:1;/opt/cray/pe/lmod/modu
lefiles/compiler/crayclang/14.0:1;/opt/cray/pe/lmod/modulefiles/mix_co
mpilers:1;/opt/cray/pe/lmod/modulefiles/perftools/22.12.0:1;/usr/share
/modulefiles/Linux:1;/usr/share/modulefiles/Core:1;/usr/share/lmod/lmo
d/modulefiles/Core:1;/appl/lumi/modules/SoftwareStack:1;/appl/lumi/mod
ules/StyleModifiers:1;/appl/lumi/modules/init-LUMI-SoftwareStack:1
LMOD_REF_COUNT_NLSPATH=/opt/cray/pe/cce/15.0.0/cce/x86_64/share/nls/En/%N.cat:1
LMOD_REF_COUNT_PATH=/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-
3.9.12.1/bin:1;/opt/cray/pe/python/3.9.12.1/bin:1;/opt/cray/libfabric/
1.15.2.0/bin:1;/pfs/lustrep1/users/mikkelse/.vscode-
server/bin/fdb98833154679dbaa7af67a5a29fe19e55c2b73/bin/remote-
cli:1;/appl/lumi/SW/system/EB/lumi-tools/23.04/bin:1;/opt/cray/pe/mpic
h/8.1.23/ofi/crayclang/10.0/bin:1;/opt/cray/pe/mpich/8.1.23/bin:1;/opt
/cray/pe/craype/2.7.19/bin:1;/opt/cray/pe/cce/15.0.0/binutils/x86_64/x
86_64-pc-linux-gnu/bin:1;/opt/cray/pe/cce/15.0.0/binutils/cross/x8664
-aarch64/aarch64-linux-gnu/../bin:1;/opt/cray/pe/cce/15.0.0/utils/x86
64/bin:1;/opt/cray/pe/cce/15.0.0/bin:1;/opt/cray/pe/cce/15.0.0/cce-cla
ng/x86_64/bin:1;/opt/cray/pe/perftools/22.12.0/bin:1;/opt/cray/pe/papi
/6.0.0.17/bin:1;/users/mikkelse/.local/bin:1;/usr/local/bin:1;/usr/bin
:1;/bin:1;/usr/lib/mit/bin:1;/opt/cray/pe/bin:1
LMOD_REF_COUNT_PE_CRAYCLANG_FIXED_PKGCONFIG_PATH=/opt/cray/pe/libsci/22.12.1.1/CRAY/9.0/x86_64/lib/pkgconfig:1;/opt/cra
y/pe/mpich/8.1.23/ofi/crayclang/10.0/lib/pkgconfig:1
LMOD_REF_COUNT_PKG_CONFIG_PATH=/opt/cray/libfabric/1.15.2.0/lib64/pkgconfig:1;/opt/cray/pe/dsmml/0.2.
2/dsmml/lib/pkgconfig:1;/opt/cray/pe/craype/2.7.19/pkg-config:1
__LMOD_REF_COUNT_PYTHONPATH=/opt/cray/pe/python/3.9.12.1:1
__LMOD_REF_COUNT_PYTHON_PATH=/opt/cray/pe/python/3.9.12.1:1
populated config files : /opt/conda/.condarc conda version : 23.3.1 conda-build version : not installed python version : 3.10.12.final.0 virtual packages : __archspec=1=x86_64 glibc=2.31=0 linux=5.14.21=0 __unix=0=0 base environment : /opt/conda (writable) conda av data dir : /opt/conda/etc/conda conda av metadata url : None channel URLs : https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch package cache : /opt/conda/pkgs /users/mikkelse/.conda/pkgs envs directories : /opt/conda/envs /users/mikkelse/.conda/envs platform : linux-64 user-agent : conda/23.3.1 requests/2.31.0 CPython/3.10.12 Linux/5.14.21-150400.24.46_12.0.73-cray_shasta_c ubuntu/20.04.5 glibc/2.31 UID:GID : 327000848:327000848 netrc file : None offline mode : False
An unexpected error has occurred. Conda has prepared the above report.
Timeout reached. No report sent.
Would you like conda to send this report to the core maintainers? [y/N]: Traceback (most recent call last): File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/container.py", line 168, in run_command_in_container process = self._subprocess_runner( File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/container.py", line 225, in _subprocess_runner return util.stream_subprocess(args=args, **kwargs) File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/util.py", line 113, in stream_subprocess completed_process.check_returncode() File "/opt/cray/pe/python/3.9.12.1/lib/python3.9/subprocess.py", line 460, in check_returncode raise CalledProcessError(self.returncode, self.args, self.stdout, subprocess.CalledProcessError: Command '['singularity', 'exec', '--writable', '--no-home', '--no-umask', PosixPath('/tmp/tmpnyhxbslm/singularity_sandbox'), 'conda', 'env', 'create', '-f', '/tmp/tmpnyhxbslm/singularity_sandbox/py311_rocm.yml', '-n', 'conda_container_env']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/bin/cotainr", line 14, in
sys.exit(main())
File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/cli.py", line 390, in main
cli.subcommand.execute()
File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/cli.py", line 148, in execute
conda_install.add_environment(path=conda_env_file, name=conda_env_name)
File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/pack.py", line 76, in add_environment
self.sandbox.run_command_in_container(
File "/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/common/EB/cotainr/2023.01.0-cray-python-3.9.12.1/cotainr/container.py", line 183, in run_command_in_container
raise ValueError(
ValueError: Invalid command cmd='conda env create -f /tmp/tmpnyhxbslm/singularity_sandbox/py311_rocm.yml -n conda_container_env' passed to Singularity resulted in the FATAL error: