Closed priyanka9991 closed 12 months ago
Model successfully runs and prints IMPROVE_RESULT when CUDA_VISIBLE_DEVICES is set to 0 in lambda0 machine, but fails for other values of CUDA_VISIBLE_DEVICES.
Model successfully runs and prints IMPROVE_RESULT when CUDA_VISIBLE_DEVICES is set to 0 in lambda0 machine, but fails for other values of CUDA_VISIBLE_DEVICES.
that might be the machine busy!
I tried on idle GPUs as well (1,3,4). But I keep getting the error except when using CUDA_VISIBLE_DEVICES=0 This is the current status in lambda0:
I am closing this issue. We will restart testing from scratch for the next release.
Commands: Within container: Build container: copy Singularity_gpu_fix.sh from /Singularity/src to the working directory. Change line 8 in SWnet.def to point to the correct path of Singularity_gpu_fix.sh Run: singularity exec --nv SWnet.sif train.sh 1 /tmp/pvasanthakumari --epochs 1
Outside container: Install environment using environment.yaml: conda env create -f environment.yaml conda activate swnet pip install numpy==1.17 # Although original ReadMe recommends version 1.16.2, CANDLE installation requires at least 1.17 Modify train.sh: CANDLE_MODEL=SWnet_CCLE_baseline_pytorch.py Run: bash train.sh 1 /tmp/pvasanthakumari
URL: https://github.com/JDACS4C-IMPROVE/SWnet/tree/develop Status (both within and outside container): RuntimeError: CUDA error: invalid device ordinal