ML4GW / aframev2

Detecting binary black hole mergers in LIGO with neural networks
MIT License
6 stars 14 forks source link

when runing training : RuntimeError: operator torchvision::nms does not exist #160

Closed VasSkliris closed 1 month ago

VasSkliris commented 3 months ago
(base) [vasileios.skliris@ldas-pcdev2 train]$ apptainer run $AFRAME_CONTAINER_ROOT/train.sif python -m train --help

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/vasileios.skliris/aframev2/projects/train/train/__main__.py", line 1, in <module>
    from train.cli import main
  File "/home/vasileios.skliris/aframev2/projects/train/train/cli.py", line 4, in <module>
    from lightning.pytorch.cli import LightningCLI
  File "/usr/local/lib/python3.10/site-packages/lightning/__init__.py", line 21, in <module>
    from lightning.pytorch.callbacks import Callback  # noqa: E402
  File "/usr/local/lib/python3.10/site-packages/lightning/pytorch/__init__.py", line 27, in <module>
    from lightning.pytorch.callbacks import Callback  # noqa: E402
  File "/usr/local/lib/python3.10/site-packages/lightning/pytorch/callbacks/__init__.py", line 14, in <module>
    from lightning.pytorch.callbacks.batch_size_finder import BatchSizeFinder
  File "/usr/local/lib/python3.10/site-packages/lightning/pytorch/callbacks/batch_size_finder.py", line 26, in <module>
    from lightning.pytorch.callbacks.callback import Callback
  File "/usr/local/lib/python3.10/site-packages/lightning/pytorch/callbacks/callback.py", line 22, in <module>
    from lightning.pytorch.utilities.types import STEP_OUTPUT
  File "/usr/local/lib/python3.10/site-packages/lightning/pytorch/utilities/types.py", line 41, in <module>
    from torchmetrics import Metric
  File "/usr/local/lib/python3.10/site-packages/torchmetrics/__init__.py", line 14, in <module>
    from torchmetrics import functional  # noqa: E402
  File "/usr/local/lib/python3.10/site-packages/torchmetrics/functional/__init__.py", line 14, in <module>
    from torchmetrics.functional.audio.pit import permutation_invariant_training, pit_permutate
  File "/usr/local/lib/python3.10/site-packages/torchmetrics/functional/audio/__init__.py", line 14, in <module>
    from torchmetrics.functional.audio.pit import permutation_invariant_training, pit_permutate  # noqa: F401
  File "/usr/local/lib/python3.10/site-packages/torchmetrics/functional/audio/pit.py", line 22, in <module>
    from torchmetrics.utilities.imports import _SCIPY_AVAILABLE
  File "/usr/local/lib/python3.10/site-packages/torchmetrics/utilities/__init__.py", line 1, in <module>
    from torchmetrics.utilities.checks import check_forward_full_state_property  # noqa: F401
  File "/usr/local/lib/python3.10/site-packages/torchmetrics/utilities/checks.py", line 25, in <module>
    from torchmetrics.utilities.data import select_topk, to_onehot
  File "/usr/local/lib/python3.10/site-packages/torchmetrics/utilities/data.py", line 19, in <module>
    from torchmetrics.utilities.imports import _TORCH_GREATER_EQUAL_1_12, _XLA_AVAILABLE
  File "/usr/local/lib/python3.10/site-packages/torchmetrics/utilities/imports.py", line 112, in <module>
    _TORCHVISION_GREATER_EQUAL_0_8: Optional[bool] = _compare_version("torchvision", operator.ge, "0.8.0")
  File "/usr/local/lib/python3.10/site-packages/torchmetrics/utilities/imports.py", line 78, in _compare_version
    if not _module_available(package):
  File "/usr/local/lib/python3.10/site-packages/torchmetrics/utilities/imports.py", line 59, in _module_available
    module = import_module(module_names[0])
  File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/home/vasileios.skliris/.local/lib/python3.10/site-packages/torchvision/__init__.py", line 6, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils
  File "/home/vasileios.skliris/.local/lib/python3.10/site-packages/torchvision/_meta_registrations.py", line 164, in <module>
    def meta_nms(dets, scores, iou_threshold):
  File "/usr/local/lib/python3.10/site-packages/torch/library.py", line 467, in inner
    handle = entry.abstract_impl.register(func_to_register, source)
  File "/usr/local/lib/python3.10/site-packages/torch/_library/abstract_impl.py", line 30, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
RuntimeError: operator torchvision::nms does not exist
VasSkliris commented 3 months ago

The same result happens when I do

APPTAINERENV_CUDA_VISIBLE_DEVICES=0 apptainer run --nv $AFRAME_CONTAINER_ROOT/train.sif python -m train --config /opt/aframe/projects/train/config.yaml --data.ifos=[H1,L1] --data.data_dir ~/aframe/data/train --trainer.logger=WandbLogger --trainer.logger.project=aframe --trainer.logger.name=my-first-run --trainer.logger.save_dir=~/aframe/results/my-first-run

wbenoit26 commented 3 months ago

Can you confirm which commit you're on? I wonder if we snuck an update in which broke things for you

VasSkliris commented 3 months ago

157 813ed212adac890e3170ba65c61dac54675a0012

wbenoit26 commented 3 months ago

Actually, I'm almost certain that's what's going on. Can you try removing and rebuilding the training container?

VasSkliris commented 3 months ago

I did delete the built image and rebuilt it . The error persists

VasSkliris commented 3 months ago

Note that it is the first time I run this

wbenoit26 commented 3 months ago

Strange, okay. I've done all the same steps myself and had no issue: I'm on the latest commit and I deleted and rebuilt the container, and the command worked as it should. Let me look into this further

wbenoit26 commented 3 months ago

The train.sif you're talking about is on CIT? When I run apptainer run /home/vasileios.skliris/aframe/images/train.sif python -m train --help, I get the expected help message

EthanMarx commented 3 months ago

Yeah it doesn't look like you're actually running the container: At the top of the error message , /home/vasileios.skliris/aframev2/projects/train/train/__main__.py is being called, which is not a path inside the container

wbenoit26 commented 3 months ago

Given that you're in the train directory, it looks like the first part of your command (apptainer run $AFRAME_CONTAINER_ROOT/train.sif) is being ignored for some reason, and all that's being done is python -m train --help

VasSkliris commented 3 months ago

Any idea why this would happen?

wbenoit26 commented 3 months ago

Not sure. What happens if you just run apptainer run $AFRAME_CONTAINER_ROOT/train.sif?

VasSkliris commented 3 months ago

It opens a python interpeter

wbenoit26 commented 3 months ago

Good, okay. If you now run train --help in that interpreter, do you get the help message?

VasSkliris commented 3 months ago

No because it doesn't recognize train , I assume in such a case it shoul import something first. In comparison when I do apptainer run $AFRAME_CONTAINER_ROOT/data.sif it does not open a python interpeter >>>

wbenoit26 commented 3 months ago

Oh right, of course, --help won't do anything in the python interpreter. Are you able to import train, at least? And then running help(train) should point to a file in /opt/

VasSkliris commented 3 months ago

Yes that works

wbenoit26 commented 3 months ago

Okay, that's good. I wonder where this chain is breaking down then. What if you do apptainer run $AFRAME_CONTAINER_ROOT/train.sif python? Are you still able to import train?

VasSkliris commented 3 months ago

Yes it still works, I think the fact that opens an interpeter is the point of breaking?

wbenoit26 commented 3 months ago

No, I believe that's expected. I don't actually know why the data container doesn't do so - maybe @EthanMarx can explain - but that's consistent with the behavior of my containers. I suppose we should confirm the scope of the issue. Can you try running apptainer run $AFRAME_CONTAINER_ROOT/data.sif python -m data --help and apptainer run $AFRAME_CONTAINER_ROOT/export.sif python -m export --help and see if either/both of those fail?

VasSkliris commented 3 months ago

Both work. Just to clarify, could I run sandbox independently of this issue or should I wait until it is resolved?

EthanMarx commented 3 months ago

This will need to be resolved. Can you retry the apptainer run $AFRAME_CONTAINER_ROOT/train.sif python -m train --help? It shouldn't be the case that Will and I can run it successfully from your container and you can't.

VasSkliris commented 3 months ago

Still the same error. Any clues what part of the setting up could have gone wrong so that this error could pop up?

wbenoit26 commented 3 months ago

My suspicion is that it's coming from not using your own miniconda install. Could you try running with the -e flag passed, ie, apptainer run -e $AFRAME_CONTAINER_ROOT/train.sif python -m train --help?

VasSkliris commented 3 months ago

Same, I am currently using the base enviroment tha soutces miniconda3 as expected

wbenoit26 commented 3 months ago

Let's continue to try and debug on the review call. One more thing to attempt is using a train container that I built: apptainer run /home/william.benoit/train.sif python -m train --help.

wbenoit26 commented 3 months ago

And could you also put the output of apptainer exec ~/aframe/images/train.sif env here?

VasSkliris commented 3 months ago
(base) [vasileios.skliris@ldas-pcdev2 train]$ apptainer exec ~/aframe/images/train.sif env
ACL_BOARD_VENDOR_PATH=/opt/Intel/OpenCLFPGA/oneAPI/Boards
ADVISOR_2023_DIR=/opt/intel/oneapi/advisor/2023.2.0
AFRAME_CONTAINER_ROOT=/home/vasileios.skliris/aframe/images/
APM=/opt/intel/oneapi/advisor/2023.2.0/perfmodels
APPTAINER_APPNAME=
APPTAINER_BIND=
APPTAINER_COMMAND=exec
APPTAINER_CONTAINER=/home/vasileios.skliris/aframe/images/train.sif
APPTAINER_ENVIRONMENT=/.singularity.d/env/91-environment.sh
APPTAINER_NAME=train.sif
BASH_FUNC__module_raw%%=() {  unset _mlshdbg;
 if [ "${MODULES_SILENT_SHELL_DEBUG:-0}" = '1' ]; then
 case "$-" in 
 *v*x*)
 set +vx;
 _mlshdbg='vx'
 ;;
 *v*)
 set +v;
 _mlshdbg='v'
 ;;
 *x*)
 set +x;
 _mlshdbg='x'
 ;;
 *)
 _mlshdbg=''
 ;;
 esac;
 fi;
 unset _mlre _mlIFS;
 if [ -n "${IFS+x}" ]; then
 _mlIFS=$IFS;
 fi;
 IFS=' ';
 for _mlv in ${MODULES_RUN_QUARANTINE:-};
 do
 if [ "${_mlv}" = "${_mlv##*[!A-Za-z0-9_]}" -a "${_mlv}" = "${_mlv#[0-9]}" ]; then
 if [ -n "`eval 'echo ${'$_mlv'+x}'`" ]; then
 _mlre="${_mlre:-}${_mlv}_modquar='`eval 'echo ${'$_mlv'}'`' ";
 fi;
 _mlrv="MODULES_RUNENV_${_mlv}";
 _mlre="${_mlre:-}${_mlv}='`eval 'echo ${'$_mlrv':-}'`' ";
 fi;
 done;
 if [ -n "${_mlre:-}" ]; then
 eval `eval ${_mlre} /usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash '"$@"'`;
 else
 eval `/usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash "$@"`;
 fi;
 _mlstatus=$?;
 if [ -n "${_mlIFS+x}" ]; then
 IFS=$_mlIFS;
 else
 unset IFS;
 fi;
 unset _mlre _mlv _mlrv _mlIFS;
 if [ -n "${_mlshdbg:-}" ]; then
 set -$_mlshdbg;
 fi;
 unset _mlshdbg;
 return $_mlstatus
}
BASH_FUNC_ml%%=() {  module ml "$@"
}
BASH_FUNC_module%%=() {  _module_raw "$@" 2>&1
}
BASH_FUNC_scl%%=() {  if [ "$1" = "load" -o "$1" = "unload" ]; then
 eval "module $@";
 else
 /usr/bin/scl "$@";
 fi
}
BASH_FUNC_switchml%%=() {  typeset swfound=1;
 if [ "${MODULES_USE_COMPAT_VERSION:-0}" = '1' ]; then
 typeset swname='main';
 if [ -e /usr/share/Modules/libexec/modulecmd.tcl ]; then
 typeset swfound=0;
 unset MODULES_USE_COMPAT_VERSION;
 fi;
 else
 typeset swname='compatibility';
 if [ -e /usr/share/Modules/libexec/modulecmd-compat ]; then
 typeset swfound=0;
 MODULES_USE_COMPAT_VERSION=1;
 export MODULES_USE_COMPAT_VERSION;
 fi;
 fi;
 if [ $swfound -eq 0 ]; then
 echo "Switching to Modules $swname version";
 source /usr/share/Modules/init/bash;
 else
 echo "Cannot switch to Modules $swname version, command not found";
 return 1;
 fi
}
BASH_FUNC_which%%=() {  ( alias;
 eval ${which_declare} ) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot $@
}
CCL_CONFIGURATION=cpu_gpu_dpcpp
CCL_ROOT=/opt/intel/oneapi/ccl/2021.10.0
CLASSPATH=/opt/intel/oneapi/dal/2023.2.0/lib/onedal.jar
CLCK_ROOT=/opt/intel/oneapi/clck/2021.7.1
CMMGR=
CMPLR_ROOT=/opt/intel/oneapi/compiler/2023.2.1
CONDA_BACKUP_ADDR2LINE=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-addr2line
CONDA_BACKUP_AR=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-ar
CONDA_BACKUP_AS=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-as
CONDA_BACKUP_BUILD=x86_64-conda-linux-gnu
CONDA_BACKUP_CC=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-cc
CONDA_BACKUP_CC_FOR_BUILD=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-cc
CONDA_BACKUP_CFLAGS=-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include
CONDA_BACKUP_CMAKE_PREFIX_PATH=/cvmfs/software.igwn.org/conda/envs/igwn:/cvmfs/software.igwn.org/conda/envs/igwn/x86_64-conda-linux-gnu/sysroot/usr
CONDA_BACKUP_CONDA_BUILD_SYSROOT=/cvmfs/software.igwn.org/conda/envs/igwn/x86_64-conda-linux-gnu/sysroot
CONDA_BACKUP_CONDA_TOOLCHAIN_BUILD=x86_64-conda-linux-gnu
CONDA_BACKUP_CONDA_TOOLCHAIN_HOST=x86_64-conda-linux-gnu
CONDA_BACKUP_CPP=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-cpp
CONDA_BACKUP_CPPFLAGS=-DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include
CONDA_BACKUP_CXX=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-c++
CONDA_BACKUP_CXXFILT=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-c++filt
CONDA_BACKUP_CXXFLAGS=-fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include
CONDA_BACKUP_CXX_FOR_BUILD=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-c++
CONDA_BACKUP_DEBUG_CFLAGS=-march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include
CONDA_BACKUP_DEBUG_CPPFLAGS=-D_DEBUG -D_FORTIFY_SOURCE=2 -Og -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include -D_DEBUG -D_FORTIFY_SOURCE=2 -Og -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include
CONDA_BACKUP_DEBUG_CXXFLAGS=-fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-all -fno-plt -Og -g -Wall -Wextra -fvar-tracking-assignments -ffunction-sections -pipe -isystem /cvmfs/software.igwn.org/conda/envs/igwn/include
CONDA_BACKUP_ELFEDIT=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-elfedit
CONDA_BACKUP_GCC=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-gcc
CONDA_BACKUP_GCC_AR=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-gcc-ar
CONDA_BACKUP_GCC_NM=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-gcc-nm
CONDA_BACKUP_GCC_RANLIB=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-gcc-ranlib
CONDA_BACKUP_GPROF=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-gprof
CONDA_BACKUP_GXX=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-g++
CONDA_BACKUP_HOST=x86_64-conda-linux-gnu
CONDA_BACKUP_LD=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-ld
CONDA_BACKUP_LDFLAGS=-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/cvmfs/software.igwn.org/conda/envs/igwn/lib -Wl,-rpath-link,/cvmfs/software.igwn.org/conda/envs/igwn/lib -L/cvmfs/software.igwn.org/conda/envs/igwn/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/cvmfs/software.igwn.org/conda/envs/igwn/lib -Wl,-rpath-link,/cvmfs/software.igwn.org/conda/envs/igwn/lib -L/cvmfs/software.igwn.org/conda/envs/igwn/lib
CONDA_BACKUP_LD_GOLD=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-ld.gold
CONDA_BACKUP_MESON_ARGS=--buildtype release
CONDA_BACKUP_NM=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-nm
CONDA_BACKUP_OBJCOPY=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-objcopy
CONDA_BACKUP_OBJDUMP=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-objdump
CONDA_BACKUP_RANLIB=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-ranlib
CONDA_BACKUP_READELF=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-readelf
CONDA_BACKUP_SIZE=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-size
CONDA_BACKUP_STRINGS=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-strings
CONDA_BACKUP_STRIP=/cvmfs/software.igwn.org/conda/envs/igwn/bin/x86_64-conda-linux-gnu-strip
CONDA_BACKUP__CONDA_PYTHON_SYSCONFIGDATA_NAME=_sysconfigdata_x86_64_conda_cos6_linux_gnu
CONDA_BACKUP_build_alias=x86_64-conda-linux-gnu
CONDA_BACKUP_host_alias=x86_64-conda-linux-gnu
CONDA_DEFAULT_ENV=base
CONDA_EXE=/home/vasileios.skliris/.miniconda3/bin/conda
CONDA_PREFIX=/home/vasileios.skliris/.miniconda3
CONDA_PREFIX_1=/cvmfs/software.igwn.org/conda/envs/igwn
CONDA_PROMPT_MODIFIER=(base) 
CONDA_PYTHON_EXE=/home/vasileios.skliris/.miniconda3/bin/python
CONDA_SHLVL=2
CONDOR_LOCATION=/usr
CPATH=/opt/intel/oneapi/vpl/2022.2.5/include:/opt/intel/oneapi/tbb/2021.10.0/env/../include:/opt/intel/oneapi/mkl/2023.2.0/include:/opt/intel/oneapi/ipp/2021.9.0/include:/opt/intel/oneapi/ippcp/2021.8.0/include:/opt/intel/oneapi/ipp/2021.9.0/include:/opt/intel/oneapi/dpl/2022.2.0/linux/include:/opt/intel/oneapi/dpcpp-ct/2023.2.0/include:/opt/intel/oneapi/dnnl/2023.2.0/cpu_dpcpp_gpu_dpcpp/include:/opt/intel/oneapi/dev-utilities/2021.10.0/include:/opt/intel/oneapi/dal/2023.2.0/include:/opt/intel/oneapi/compiler/2023.2.1/linux/lib/oclfpga/include:/opt/intel/oneapi/ccl/2021.10.0/include/cpu_gpu_dpcpp
CPLUS_INCLUDE_PATH=/opt/intel/oneapi/clck/2021.7.1/include
CUDA_VISIBLE_DEVICES=0,1
CWB_HTML_BODY_PROD=
CWB_HTML_HEADER=
CWB_HTML_INDEX=
CWB_LAG_NUMBER=
CWB_SLAG_NUMBER=
DAALROOT=/opt/intel/oneapi/dal/2023.2.0
DALROOT=/opt/intel/oneapi/dal/2023.2.0
DAL_MAJOR_BINARY=1
DAL_MINOR_BINARY=1
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/44659/bus
DEBUGINFOD_URLS=https://debuginfod.centos.org/ 
DEFAULT_SEGMENT_SERVER=http://segments.ldas.cit
DIAGUTIL_PATH=/opt/intel/oneapi/vtune/2023.2.0/sys_check/vtune_sys_check.py:/opt/intel/oneapi/dpcpp-ct/2023.2.0/sys_check/sys_check.sh:/opt/intel/oneapi/debugger/2023.2.0/sys_check/debugger_sys_check.py:/opt/intel/oneapi/compiler/2023.2.1/sys_check/sys_check.sh:/opt/intel/oneapi/advisor/2023.2.0/sys_check/advisor_sys_check.py:
DNNLROOT=/opt/intel/oneapi/dnnl/2023.2.0/cpu_dpcpp_gpu_dpcpp
DPCT_BUNDLE_ROOT=/opt/intel/oneapi/dpcpp-ct/2023.2.0
DPL_ROOT=/opt/intel/oneapi/dpl/2022.2.0
ECP_IDP=LIGO
FPGA_VARS_ARGS=
FPGA_VARS_DIR=/opt/intel/oneapi/compiler/2023.2.1/linux/lib/oclfpga
GDB_INFO=/opt/intel/oneapi/debugger/2023.2.0/documentation/info/
GLOBUS_TCP_PORT_RANGE=40000,40500
GLOBUS_TCP_SOURCE_RANGE=40501,41000
GPG_KEY=A035C8C19219BA821ECEA86B64E628F8D684696D
GWDATAFIND_SERVER=datafind.ldas.cit:80
HISTCONTROL=ignoredups
HISTSIZE=1000
HOME=/home/vasileios.skliris
HOSTNAME=ldas-pcdev2
IGWN_POOL=condor.igwn.org
INFOPATH=/opt/intel/oneapi/debugger/2023.2.0/gdb/intel64/lib
INSPECTOR_2023_DIR=/opt/intel/oneapi/inspector/2023.2.0
INTELFPGAOCLSDKROOT=/opt/intel/oneapi/compiler/2023.2.1/linux/lib/oclfpga
INTEL_LICENSE_FILE=/opt/intel/licenses:/home/vasileios.skliris/intel/licenses:/opt/intel/oneapi/clck/2021.7.1/licensing:/opt/intel/licenses:/home/vasileios.skliris/intel/licenses:/Users/Shared/Library/Application Support/Intel/Licenses
INTEL_PYTHONHOME=/opt/intel/oneapi/debugger/2023.2.0/dep
IPPCP_TARGET_ARCH=intel64
IPPCRYPTOROOT=/opt/intel/oneapi/ippcp/2021.8.0
IPPROOT=/opt/intel/oneapi/ipp/2021.9.0
IPP_TARGET_ARCH=intel64
JAVA_HOME=/usr/lib/jvm/java/jre
KRB5_KTNAME=/home/vasileios.skliris/.kerberos/ligo.org.keytab
LAL_DATA_PATH=/scratch/lalsimulation
LANG=C.UTF-8
LD_LIBRARY_PATH=/.singularity.d/libs
LESSOPEN=||/usr/bin/lesspipe.sh %s
LIBRARY_PATH=/opt/intel/oneapi/vpl/2022.2.5/lib:/opt/intel/oneapi/tbb/2021.10.0/env/../lib/intel64/gcc4.8:/opt/intel/oneapi/mkl/2023.2.0/lib/intel64:/opt/intel/oneapi/ipp/2021.9.0/lib/intel64:/opt/intel/oneapi/ippcp/2021.8.0/lib/intel64:/opt/intel/oneapi/ipp/2021.9.0/lib/intel64:/opt/intel/oneapi/dnnl/2023.2.0/cpu_dpcpp_gpu_dpcpp/lib:/opt/intel/oneapi/dal/2023.2.0/lib/intel64:/opt/intel/oneapi/compiler/2023.2.1/linux/compiler/lib/intel64_lin:/opt/intel/oneapi/compiler/2023.2.1/linux/lib:/opt/intel/oneapi/clck/2021.7.1/lib/intel64:/opt/intel/oneapi/ccl/2021.10.0/lib/cpu_gpu_dpcpp
LIGO_DATAFIND_SERVER=datafind.ldas.cit:80
LIGO_USERNAME=vasileios.skliris
LOADEDMODULES=
LOGNAME=vasileios.skliris
LS_COLORS=rs=0:di=38;5;33:ln=38;5;51:mh=00:pi=40;38;5;11:so=38;5;13:do=38;5;5:bd=48;5;232;38;5;11:cd=48;5;232;38;5;3:or=48;5;232;38;5;9:mi=01;05;37;41:su=48;5;196;38;5;15:sg=48;5;11;38;5;16:ca=48;5;196;38;5;226:tw=48;5;10;38;5;16:ow=48;5;10;38;5;21:st=48;5;21;38;5;15:ex=38;5;40:*.tar=38;5;9:*.tgz=38;5;9:*.arc=38;5;9:*.arj=38;5;9:*.taz=38;5;9:*.lha=38;5;9:*.lz4=38;5;9:*.lzh=38;5;9:*.lzma=38;5;9:*.tlz=38;5;9:*.txz=38;5;9:*.tzo=38;5;9:*.t7z=38;5;9:*.zip=38;5;9:*.z=38;5;9:*.dz=38;5;9:*.gz=38;5;9:*.lrz=38;5;9:*.lz=38;5;9:*.lzo=38;5;9:*.xz=38;5;9:*.zst=38;5;9:*.tzst=38;5;9:*.bz2=38;5;9:*.bz=38;5;9:*.tbz=38;5;9:*.tbz2=38;5;9:*.tz=38;5;9:*.deb=38;5;9:*.rpm=38;5;9:*.jar=38;5;9:*.war=38;5;9:*.ear=38;5;9:*.sar=38;5;9:*.rar=38;5;9:*.alz=38;5;9:*.ace=38;5;9:*.zoo=38;5;9:*.cpio=38;5;9:*.7z=38;5;9:*.rz=38;5;9:*.cab=38;5;9:*.wim=38;5;9:*.swm=38;5;9:*.dwm=38;5;9:*.esd=38;5;9:*.jpg=38;5;13:*.jpeg=38;5;13:*.mjpg=38;5;13:*.mjpeg=38;5;13:*.gif=38;5;13:*.bmp=38;5;13:*.pbm=38;5;13:*.pgm=38;5;13:*.ppm=38;5;13:*.tga=38;5;13:*.xbm=38;5;13:*.xpm=38;5;13:*.tif=38;5;13:*.tiff=38;5;13:*.png=38;5;13:*.svg=38;5;13:*.svgz=38;5;13:*.mng=38;5;13:*.pcx=38;5;13:*.mov=38;5;13:*.mpg=38;5;13:*.mpeg=38;5;13:*.m2v=38;5;13:*.mkv=38;5;13:*.webm=38;5;13:*.ogm=38;5;13:*.mp4=38;5;13:*.m4v=38;5;13:*.mp4v=38;5;13:*.vob=38;5;13:*.qt=38;5;13:*.nuv=38;5;13:*.wmv=38;5;13:*.asf=38;5;13:*.rm=38;5;13:*.rmvb=38;5;13:*.flc=38;5;13:*.avi=38;5;13:*.fli=38;5;13:*.flv=38;5;13:*.gl=38;5;13:*.dl=38;5;13:*.xcf=38;5;13:*.xwd=38;5;13:*.yuv=38;5;13:*.cgm=38;5;13:*.emf=38;5;13:*.ogv=38;5;13:*.ogx=38;5;13:*.aac=38;5;45:*.au=38;5;45:*.flac=38;5;45:*.m4a=38;5;45:*.mid=38;5;45:*.midi=38;5;45:*.mka=38;5;45:*.mp3=38;5;45:*.mpc=38;5;45:*.ogg=38;5;45:*.ra=38;5;45:*.wav=38;5;45:*.oga=38;5;45:*.opus=38;5;45:*.spx=38;5;45:*.xspf=38;5;45:
MAIL=/var/spool/mail/vasileios.skliris
MANPATH=/opt/intel/oneapi/itac/2021.10.0/man:/opt/intel/oneapi/debugger/2023.2.0/documentation/man:/opt/intel/oneapi/compiler/2023.2.1/documentation/en/man/common:/opt/intel/oneapi/clck/2021.7.1/man:/man::/opt/puppetlabs/puppet/share/man
MKLROOT=/opt/intel/oneapi/mkl/2023.2.0
MODULEPATH=/etc/scl/modulefiles:/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles
MODULEPATH_modshare=/usr/share/Modules/modulefiles:2:/etc/modulefiles:2:/usr/share/modulefiles:2
MODULESHOME=/usr/share/Modules
MODULES_CMD=/usr/share/Modules/libexec/modulecmd.tcl
MODULES_RUN_QUARANTINE=LD_LIBRARY_PATH LD_PRELOAD
NDSSERVER=nds.ldas.cit:31200
NLSPATH=/opt/intel/oneapi/mkl/2023.2.0/lib/intel64/locale/%l_%t/%N:/opt/intel/oneapi/compiler/2023.2.1/linux/compiler/lib/intel64_lin/locale/%l_%t/%N
OCL_ICD_FILENAMES=libintelocl_emu.so:libalteracl.so:/opt/intel/oneapi/compiler/2023.2.1/linux/lib/x64/libintelocl.so
OLDPWD=/home/vasileios.skliris/aframev2/projects
ONEAPI_ROOT=/opt/intel/oneapi
ONLINEDQ=/online/DQ
ONLINEHOFT=/online/frames/hoft
PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PKG_CONFIG_PATH=/opt/intel/oneapi/vtune/2023.2.0/include/pkgconfig/lib64:/opt/intel/oneapi/vpl/2022.2.5/lib/pkgconfig:/opt/intel/oneapi/tbb/2021.10.0/env/../lib/pkgconfig:/opt/intel/oneapi/mkl/2023.2.0/lib/pkgconfig:/opt/intel/oneapi/ippcp/2021.8.0/lib/pkgconfig:/opt/intel/oneapi/inspector/2023.2.0/include/pkgconfig/lib64:/opt/intel/oneapi/dpl/2022.2.0/lib/pkgconfig:/opt/intel/oneapi/dnnl/2023.2.0/cpu_dpcpp_gpu_dpcpp/../lib/pkgconfig:/opt/intel/oneapi/dal/2023.2.0/lib/pkgconfig:/opt/intel/oneapi/compiler/2023.2.1/lib/pkgconfig:/opt/intel/oneapi/ccl/2021.10.0/lib/pkgconfig:/opt/intel/oneapi/advisor/2023.2.0/include/pkgconfig/lib64:
PROMPT_COMMAND=PS1="Apptainer> "; unset PROMPT_COMMAND
PS1=Apptainer> 
PWD=/home/vasileios.skliris/aframev2/projects/train
PYTHONPATH=/opt/intel/oneapi/advisor/2023.2.0/pythonapi
PYTHON_GET_PIP_SHA256=45a2bb8bf2bb5eff16fdd00faef6f29731831c7c59bd9fc2bf1f3bed511ff1fe
PYTHON_GET_PIP_URL=https://github.com/pypa/get-pip/raw/9af82b715db434abb94a0a6f3569f43e72157346/public/get-pip.py
PYTHON_PIP_VERSION=23.0.1
PYTHON_SETUPTOOLS_VERSION=65.5.1
PYTHON_VERSION=3.10.12
S6_SEGMENT_SERVER=http://segments-s6.ldas.cit
SETVARS_COMPLETED=1
SHELL=/bin/bash
SHLVL=1
SINGULARITY_BIND=
SINGULARITY_CONTAINER=/home/vasileios.skliris/aframe/images/train.sif
SINGULARITY_ENVIRONMENT=/.singularity.d/env/91-environment.sh
SINGULARITY_NAME=train.sif
SSH_CLIENT=81.98.80.220 39844 22
SSH_CONNECTION=81.98.80.220 39844 131.215.113.172 22
SSH_TTY=/dev/pts/11
S_COLORS=auto
TBBROOT=/opt/intel/oneapi/tbb/2021.10.0/env/..
TERM=xterm-256color
TF_FORCE_GPU_ALLOW_GROWTH=true
TMOUT=259200
TMPDIR=/local/vasileios.skliris
USER=vasileios.skliris
USER_PATH=/usr/share/Modules/bin:/opt/intel/oneapi/vtune/2023.2.0/bin64:/opt/intel/oneapi/vpl/2022.2.5/bin:/opt/intel/oneapi/mkl/2023.2.0/bin/intel64:/opt/intel/oneapi/itac/2021.10.0/bin:/opt/intel/oneapi/inspector/2023.2.0/bin64:/opt/intel/oneapi/dpcpp-ct/2023.2.0/bin:/opt/intel/oneapi/dev-utilities/2021.10.0/bin:/opt/intel/oneapi/debugger/2023.2.0/gdb/intel64/bin:/opt/intel/oneapi/compiler/2023.2.1/linux/lib/oclfpga/bin:/opt/intel/oneapi/compiler/2023.2.1/linux/bin/intel64:/opt/intel/oneapi/compiler/2023.2.1/linux/bin:/opt/intel/oneapi/clck/2021.7.1/bin/intel64:/opt/intel/oneapi/advisor/2023.2.0/bin64:/cvmfs/software.igwn.org/conda/envs/igwn-py310-20240514/epics/bin/linux-x86_64:/home/vasileios.skliris/.miniconda3/bin:/cvmfs/software.igwn.org/conda/condabin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/ldcg/matlab_r2020b/bin:/opt/dcs/bin:/opt/puppetlabs/bin:/home/vasileios.skliris/.local/bin:/home/vasileios.skliris/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
VTUNE_PROFILER_2023_DIR=/opt/intel/oneapi/vtune/2023.2.0
VTUNE_PROFILER_DIR=/opt/intel/oneapi/vtune/2023.2.0
VT_ADD_LIBS=-ldwarf -lelf -lvtunwind -lm -lpthread
VT_LIB_DIR=/opt/intel/oneapi/itac/2021.10.0/lib
VT_MPI=impi4
VT_ROOT=/opt/intel/oneapi/itac/2021.10.0
VT_SLIB_DIR=/opt/intel/oneapi/itac/2021.10.0/slib
X509_USER_PROXY=/home/vasileios.skliris/cilogon_cert/CERT_KEY.pem
XDG_RUNTIME_DIR=/run/user/44659
XDG_SESSION_ID=319223
XLA_PYTHON_CLIENT_PREALLOCATE=false
_=/usr/bin/apptainer
_CE_CONDA=
_CE_M=
_USE_ROOT6=1
tmpdir=/local/vasileios.skliris
which_declare=declare -f
VasSkliris commented 3 months ago

Update: Reinstalling minconda3 in a not dot directory still creates the error above

EthanMarx commented 3 months ago

@VasSkliris Let's find a time to hop on a zoom and debug this next week!

VasSkliris commented 3 months ago

I would be able to meet at 11 your time for a bit if you are available time during this week

EthanMarx commented 3 months ago

@VasSkliris Does 11 EDT work today?

VasSkliris commented 3 months ago

Yes I will join in the review zoom

wbenoit26 commented 3 months ago

Is this meeting happening? No one else is in the review zoom

EthanMarx commented 1 month ago

Closing think I since we isolated this to a specific cluster / user environment