CUDA version if Linux
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
Hi,
My job failed with an error message like "CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered".
Is this due to a shortage of GPU memory?
I ran the job on a server with two Quadro RTX 8000. Because I was allowed to use only one of the two GPUs, I ran the command below before running colabfold_batch.
export CUDA_VISIBLE_DEVICES=0
My main command is below.
nohup colabfold_batch Hexamer.faa Hexamer.ColabFold --num-recycle 3 > nohup.log 2>&1 &
Below is the whole "log.txt" file created within "Hexamer.ColabFold" directory.
2024-06-01 13:39:23,688 Running colabfold 1.5.5 (1648d2335943f9a483b6a803ebaea3e76162c788)
2024-06-01 13:39:23,887 Running on GPU
2024-06-01 13:39:24,307 Found 5 citations for tools or databases
2024-06-01 13:39:24,307 Query 1/1: Hexamer (length 5856)
2024-06-01 13:39:25,934 Setting max_seq=508, max_extra_seq=828
2024-06-01 13:59:21,926 Could not predict Hexamer. Not Enough GPU memory? INTERNAL: Failed to enqueue async memset operation: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2024-06-01 13:59:21,926 Done
Below is the whole "nohup log" file.
nohup: ignoring input
WARNING: You are welcome to use the default MSA server, however keep in mind that it's a
limited shared resource only capable of processing a few thousand MSAs per day. Please
submit jobs only from a single IP address. We reserve the right to limit access to the
server case-by-case when usage exceeds fair use. If you require more MSAs: You can
precompute all MSAs with `colabfold_search` or host your own API and pass it to `--host-url`
2024-06-01 13:39:23,688 Running colabfold 1.5.5 (1648d2335943f9a483b6a803ebaea3e76162c788)
2024-06-01 13:39:23,887 Running on GPU
2024-06-01 13:39:24,307 Found 5 citations for tools or databases
2024-06-01 13:39:24,307 Query 1/1: Hexamer (length 5856)
0%| | 0/150 [elapsed: 00:00 remaining: ?]
SUBMIT: 0%| | 0/150 [elapsed: 00:00 remaining: ?]
COMPLETE: 0%| | 0/150 [elapsed: 00:00 remaining: ?]
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
COMPLETE: 100%|██████████| 150/150 [elapsed: 00:00 remaining: 00:00]
E0601 13:59:21.874842 93333 gpu_timer.cc:156] INTERNAL: Could not synchronize CUDA stream: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0601 13:59:21.874864 93333 gpu_timer.cc:162] INTERNAL: Error destroying CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0601 13:59:21.874866 93333 gpu_timer.cc:168] INTERNAL: Error destroying CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
E0601 13:59:21.895475 93333 se_gpu_pjrt_client.cc:634] Failed to query available memory for GPU 0
E0601 13:59:21.895972 93333 se_gpu_pjrt_client.cc:634] Failed to query available memory for GPU 1
2024-06-01 13:39:25,934 Setting max_seq=508, max_extra_seq=828
2024-06-01 13:59:21,926 Could not predict 2901385346_Hexamer. Not Enough GPU memory? INTERNAL: Failed to enqueue async memset operation: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2024-06-01 13:59:21,926 Done
The input was homohexamer with a total length of 5,856 aa.
A job with homopentamer of the same protein (4,880 aa) was finished successfully.
Computational environment
Hi,
My job failed with an error message like "CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered".
Is this due to a shortage of GPU memory?
I ran the job on a server with two Quadro RTX 8000. Because I was allowed to use only one of the two GPUs, I ran the command below before running colabfold_batch.
export CUDA_VISIBLE_DEVICES=0
My main command is below.
nohup colabfold_batch Hexamer.faa Hexamer.ColabFold --num-recycle 3 > nohup.log 2>&1 &
Below is the whole "log.txt" file created within "Hexamer.ColabFold" directory.
Below is the whole "nohup log" file.
The input was homohexamer with a total length of 5,856 aa. A job with homopentamer of the same protein (4,880 aa) was finished successfully.
Thanks.