MunissaSadykova commented 3 years ago

Hello! I've been trying to use deepsignal-plant for my analysis recently and I'm having a trouble when calling CHH modifications using GPU. I'm using the command suggested here and it starts working at first, but then after a while it stops. The work doesn't fail (and it shows as running and using some memory) till it's timeout, and no error message is produced in the log file, but I can see that output file is not being modified further. I tried with both my own data and with the sample data, but I'm having similar issues. Previously, calling CHH modifications worked when I run it on CPU using the extracted-feature file, but it took around 10days; since the samples I'm trying now are three time larger and I'm limited in resources, I was hoping to make it faster by running it on GPU. Do you have any ideas what could be the reason?

PengNi commented 3 years ago

Hi @MunissaSadykova , thanks for your interest of our tool.

About your issue, I cannot figure it out without more information provided.

I suggest that (1) you can check if torch works well with cuda in your environment as follows (open a python console):

# python console
import torch; torch.cuda.is_available()

(2) Could you provide the commands you used (including basecall by Guppy, resquiggle by tombo, call_mods by deepsignal-plant), and the log file of tombo and deepsignal-plant?

Best, Peng

MunissaSadykova commented 3 years ago

Hello! Thank you very much for your reply!

(1) I checked the torch and it had returned False:

>>> import torch; torch.cuda.is_available()
False

I tried to re-install it, but nothing changed.

(2) sure, here's the list of commands and I'm attaching a link to find the log files:

/apps/unit/SazeU/ont-guppy/bin/guppy_basecaller -i ./fast5s -r -s ./sample_output -c dna_r9.4.1_450bps_hac_prom.cfg --device CUDA:0

tombo resquiggle /flash/SazeU/Munissa/dna-methylation/2_data/4_WT_met1/single_read_fast5 \
/bucket/SazeU/Munissa/dna-methylation/2_data/GCF_000001735.4_TAIR10.1_genomic.fna \
--processes 10 --corrected-group RawGenomeCorrected_000 --overwrite --basecall-group Basecall_1D_000

deepsignal_plant extract -i /bucket/SazeU/Munissa/dna-methylation/2_data/results_deepsignal_p/4_WT_met1/single_read_fast5 \
--reference_path /bucket/SazeU/Munissa/dna-methylation/2_data/GCF_000001735.4_TAIR10.1_genomic3.fna \
-o /flash/SazeU/Munissa/dna-methylation/2_data/4_WT_met1/WT_met1_fast5s.CHH.features.tsv \
--corrected_group RawGenomeCorrected_000 --nproc 30 --motifs CHH

CUDA_VISIBLE_DEVICES=0 deepsignal_plant call_mods \
--input_path /work/SazeU/Munissa/4_WT_met1/single_read_fast5 \
--model_path /work/SazeU/Munissa/4_WT_met1/model.dp2.CHH.arabnrice2-1_R9.4plus_tem.bn13_sn16.denoise_signal_bilstm.both_bilstm.b13_s16_epoch7.ckpt \
--result_file ./WT_met1_fast5s.CHH.call_mods.tsv \
--corrected_group RawGenomeCorrected_000 \
--reference_path /work/SazeU/Munissa/4_WT_met1/GCF_000001735.4_TAIR10.1_genomic4.fna \
--motifs CHH --nproc 30 --nproc_gpu 6

Log files (tombo, deepsignal_extract, deepsignal_call_mods): https://drive.google.com/drive/folders/1IAy0m3HtTft5ioyUrgBugiPIJ0FkBZKo?usp=sharing

P.s. just in case, I run Guppy and deepsignal_callmods on GPU, and the other commands on CPU.

I will appreciate any further suggestion.

Best wishes, Munissa

PengNi commented 3 years ago

Hi Munissa,

Acoording to the logs, it seems that the commands are fine. I think the problem is that your torch version is not compitable with your CUDA version.

So what is the version of CUDA installed in your server? FWIW, You can check your CUDA version by nvcc -V or nvidia-smi. And then install an appropriate torch version. torch>=1.2.0, <=1.6.0 is suggested.

Best, Peng

PengNi commented 3 years ago

close due to inactivity

HeLi-80 commented 1 year ago

Hello PengNi

we have problems letting deepsignal_plant call_mods works with our nvidia A100-PCIE-40GB. cudatoolkit: 11.7 CUDA Version: 11.6 pytorch: 1.11 python: 3.9

with import torch; torch.cuda.is_available() we get False

When we run deepsignal_plant call_mods with this setup

fast5 files as input, use GPU

CUDA_VISIBLE_DEVICES=0 deepsignal_plant call_mods --input_path fast5s/ \ --model_path model.dp2.CNN.arabnrice2-1_120m_R9.4plus_tem.bn13_sn16.both_bilstm.epoch6.ckpt \ --result_file fast5s.C.call_mods.tsv \ --corrected_group RawGenomeCorrected_000 \ --motifs C --nproc 30 --nproc_gpu 6

the process works but only with CPU.

Some advice?

Thank you

PengNi commented 1 year ago

Hello PengNi

we have problems letting deepsignal_plant call_mods works with our nvidia A100-PCIE-40GB. cudatoolkit: 11.7 CUDA Version: 11.6 pytorch: 1.11 python: 3.9

with import torch; torch.cuda.is_available() we get False

When we run deepsignal_plant call_mods with this setup

fast5 files as input, use GPU

CUDA_VISIBLE_DEVICES=0 deepsignal_plant call_mods --input_path fast5s/ --model_path model.dp2.CNN.arabnrice2-1_120m_R9.4plus_tem.bn13_sn16.both_bilstm.epoch6.ckpt --result_file fast5s.C.call_mods.tsv --corrected_group RawGenomeCorrected_000 --motifs C --nproc 30 --nproc_gpu 6

the process works but only with CPU.

Some advice?

Thank you

Hi @HeLi-80 , maybe it is because that the cudatoolkit version is larger than the CUDA version. You can try re-install pytorch as:

conda install pytorch==1.11.0 cudatoolkit=10.2 -c pytorch

OR create a new environment as:

conda env create --name deepsignalpenv -f /path/to/deepsignal-plant/environment.yml

GPU only works when torch.cuda.is_available() is True.

Best, Peng

HeLi-80 commented 1 year ago

Hi @PengNi, we initially installed pytorch 1.11 and cudatoolkit=10.2 but we get an error message concerning our GPU nvidia A100-PCIE-40GB:

...python3.9/site-packages/torch/cuda/init.py:145: UserWarning: NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. If you want to use the A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

We created a new environment with cudatoolkit: 11.7 and the UserWarning disappeared, but with "import torch; torch.cuda.is_available()" we get False, as I already mentioned. We will try to downgrade the cudatoolkit and we will let you know how it works.

Best, Elio

HeLi-80 commented 1 year ago

Hi @PengNi, we solved with cudatoolkit: 11.1

import torch; torch.cuda.is_available() True

The NVIDIA A100-PCIE-40GB works fine.

Best, Elio

PengNi commented 1 year ago

Hi @PengNi, we solved with cudatoolkit: 11.1

import torch; torch.cuda.is_available() True

The NVIDIA A100-PCIE-40GB works fine.

Best, Elio

@HeLi-80 , that's great!

Best, Peng

PengNi / deepsignal-plant

call_mods on gpu #6

fast5 files as input, use GPU

fast5 files as input, use GPU