JDACS4C-IMPROVE / Singularity

Singularity definitions that can be extended to support execution of community models.
MIT License
3 stars 5 forks source link

Test SWnet #55

Closed priyanka9991 closed 12 months ago

priyanka9991 commented 1 year ago

Commands: Within container: Build container: copy Singularity_gpu_fix.sh from /Singularity/src to the working directory. Change line 8 in SWnet.def to point to the correct path of Singularity_gpu_fix.sh Run: singularity exec --nv SWnet.sif train.sh 1 /tmp/pvasanthakumari --epochs 1

Outside container: Install environment using environment.yaml: conda env create -f environment.yaml conda activate swnet pip install numpy==1.17 # Although original ReadMe recommends version 1.16.2, CANDLE installation requires at least 1.17 Modify train.sh: CANDLE_MODEL=SWnet_CCLE_baseline_pytorch.py Run: bash train.sh 1 /tmp/pvasanthakumari

URL: https://github.com/JDACS4C-IMPROVE/SWnet/tree/develop Status (both within and outside container): RuntimeError: CUDA error: invalid device ordinal

Screen Shot 2023-09-13 at 2 35 41 PM
priyanka9991 commented 1 year ago

Model successfully runs and prints IMPROVE_RESULT when CUDA_VISIBLE_DEVICES is set to 0 in lambda0 machine, but fails for other values of CUDA_VISIBLE_DEVICES.

rajeeja commented 1 year ago

Model successfully runs and prints IMPROVE_RESULT when CUDA_VISIBLE_DEVICES is set to 0 in lambda0 machine, but fails for other values of CUDA_VISIBLE_DEVICES.

that might be the machine busy!

priyanka9991 commented 1 year ago

I tried on idle GPUs as well (1,3,4). But I keep getting the error except when using CUDA_VISIBLE_DEVICES=0 This is the current status in lambda0:

Screen Shot 2023-09-13 at 3 51 40 PM
wilke commented 12 months ago

I am closing this issue. We will restart testing from scratch for the next release.