Closed szakallasn3 closed 1 year ago
Hi @szakallasn3, what GPU does the device you are running the workflow on have? The error seems to imply that Docker is not able to schedule the basecalling tasks to a GPU.
Thanks for your quick reply and for the tip. Now I specified the GPU usage as that I did previously during Guppy basecalling by --cuda_device command, however the following error remained:
Error executing process > 'lookup_clair3_model (1)'
Caused by:
Process lookup_clair3_model (1)
terminated with an error exit status (65)
Command executed:
clair3_model=$(resolve_clair3_model.py lookup_table 'dna_r10.4.1_e8.2_400bps_hac@v4.0.0') cp -r ${CLAIR_MODELS_PATH}/${clair3_model} model
Command exit status: 65
Command output: (empty)
[CRITICAL ERROR] Unknown basecaller configuration.
The input basecaller configuration 'dna_r10.4.1_e8.2_400bps_hac@v4.0.0' does not have a suitable Clair3 model because the basecaller configuration has not been recognised.
Check your --basecaller_cfg has been provided correctly.
I checked and the --basecaller_cfg is provided correctly.
Do you have any suggestions?
@szakallasn3 Please update to wf-human-variation v1.1.0 where that model was added to the Clair3 lookup. If you're using Nextflow to manage your workflows you can update with nextflow pull epi2me-labs/wf-human-variation
.
I made the update, however the error messages remained - the basecaller and clair3 model problems shown on screenshots.
I made the update, however the error messages remained - the basecaller and clair3 model problems shown on screenshots.
Hi @szakallasn3, would you mind sharing the latest stdout (the one with the big EPI2ME-labs logo), just so I can confirm the right version is loaded and to confirm your parameters.
Sure:
I also attached the error message.
Thanks for your help!
@szakallasn3 This error still indicates that Docker is not able to run containers with a GPU. Your device is likely missing the nvidia-container-toolkit
. Please follow the instructions here to install the nvidia-container-toolkit
. You will need to follow the steps to:
nvidia-container-toolkit
Once you have followed those steps in the linked documentation, you should be able to run this workflow.
Many thanks for your help. I made what you have suggested, however now I'm facing with the following problem:
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 1; 31.75 GiB total capacity; 0 bytes already allocated; 6.62 MiB free; 0 bytes reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
and I have enough memory. As It is suggested in the error message, I changed the max_split_size_mb to avoid fragmentation and after running the human-variation-wf command again, I got the same error message. Do you have any suggestions? I read several github issues and stack overflow posts, where this problem was discussed, but unfortunately none of tips worked for me.
@szakallasn3 Glad that you can get Docker with GPU started now! Your new error is implying that your GPU is using up all its memory doing something else. You can use the nvidia-smi
command to check what tasks are running on your GPU and how much free memory it has.
I'm closing this old issue but please re-open if you are still running into trouble.
What happened?
Hello everyone!
I started epi2me lab's human variation workflow for the first time, and faced with dorado basecalling and clair3 problems. The error message was the following:
Error executing process > 'basecalling:wf_dorado:dorado (84)', Process 'basecalling: wf_dorado:dorado (84)' terminated with an error exit status (125) AND Error executing process > 'lookup_clair3_model (1)'.
This is a little bit confusing for me, because I followed the recommended steps from: https://github.com/epi2me-labs/wf-human-variation, checked and used the available models for dorado basecaller and clair3. The workflow was started in nextflow environment.
The terminal command line was: nextflow run epi2me-labs/wf-human-variation -r v1.0.1 -w clair3 -profile standard --snp --sv --methyl --fast_dir 'path_to_fast_dir' --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_sup@v3.5.2' --remora_cfg 'r1041_e82_400bps_sup@g632' --ref 'path_to_reference_genome' --out_dir 'path_to_out_dir'
Some screenshots are attached to this issue, I hope they also help in solving this.
If anyone has faced with this or similar problem earlier and has the solution or any idea regarding this, please let me know. Really thanks for your help!
Operating System
ubuntu 18.04
Workflow Execution
Command line
Workflow Execution - EPI2ME Labs Versions
No response
Workflow Execution - CLI Execution Profile
None
Workflow Version
v1.0.1
Relevant log output