bioinfomaticsCSU / deepsignal

Detecting methylation using signal-level features from Nanopore sequencing reads
GNU General Public License v3.0
109 stars 21 forks source link

Information about deepsignal #56

Closed cyber-ux closed 4 years ago

cyber-ux commented 4 years ago

Hi, I would like some more information about this tool and therefore I would like to ask you: 1- what are the differences between extract features and call mods; 2- launching deepsignal as I can see when the tool ends and how long it takes 3- in extract features you can change the value of - -write-path or the one present is standard / default 4- in call modification you can change the value of - - model_path

PengNi commented 4 years ago

Hi Lucio,

Thanks for your interest.

  1. extract extracts features to text for training models or calling methylation. call_mods calls methylation from fast5s or the text file generated by extract.
  2. Currently deepsignal doesn't support estimating the progress in real-time. We may implement this further.
  3. --write-path can be changed to the path/file name whatever you want to.
  4. --model_path must be set to the file path of the model. You can download the model from google drive. After uncompress it, the model can be used for call_mods:
    deepsignal call_mods --input_path fast5s.al/ --model_path model.CpG.R9.4_1D.human_hx1.bn17.sn360.v0.1.7+/bn_17.sn_360.epoch_9.ckpt --result_file fast5s.al.CpG.call_mods.tsv --reference_path GCF_000146045.2_R64_genomic.fna --corrected_group RawGenomeCorrected_001 --nproc 10 --is_gpu no

Best, Peng

cyber-ux commented 4 years ago

So it's a standard / default template, if I don't download it I can't call the modifications ? This model is very large ? How much memory does it take up ? Then I have other questions : 1- I launched the feature extraction, I can't understand when the tool ends (the same thing for call_mod and frequencies), can you give me some examples to understand when the tool ends?. Also he tells me “write_process started”, what does it mean ? he tells me that : fast5_dir: fast5_single recursively: yes corrected_group: RawGenomeCorrected_000 basecall_subgroup: BaseCalled_template reference_path: GCF_000001405.39_GRCh38.p13_genomic.fna is_dna: yes normalize_method: mad methy_label: 1 kmer_len: 17 cent_signals_len: 360 motifs: CG mod_loc: 0 write_path: fast5s.al.CpG.signal_features.17bases.rawsignals_360.tsv nproc: 20 f5_batch_num: 100

===============================================

14340000 fast5 files in total.. parse the motifs string.. read genome reference file.. write_process started.. Does that mean it's finished or is it still continuing ? 2- I saw in google drive and there are three models, based on what should I choose? Best

PengNi commented 4 years ago

Hi Lucio,

The model takes 8G-11G GPU RAM. 1- write_process started.. means the extract process just starts. deepsignal can't estimate when the process ends. 2- You can check README for the illustration of trained models. model.CpG.R9.4_1D.human_hx1.bn17.sn360.v0.1.7+.tar.gz is our latest model for 5mCpG detection.

Best, Peng

cyber-ux commented 4 years ago

Hi Peng, but does the model occupy 8G-11G once unzipped? How much does the compressed one take roughly? Best Regards

PengNi commented 4 years ago

~500M disk space after unzipped.

cyber-ux commented 4 years ago

So compressed is 500 Megabytes, decompressed 8-11 Gigabytes. Did I get it right ?

PengNi commented 4 years ago

Not actually. The model takes ~500M space in disk after unzipped, and takes 8-11G RAM in GPU when calling methyaltion.

cyber-ux commented 4 years ago

But before decompression how much does it take up ?

PengNi commented 4 years ago

The model can't be loaded to GPU memory without unzipping it.

cyber-ux commented 4 years ago

And to disk ?

PengNi commented 4 years ago

Sorry, I cannot understand what you mean. The model tar.gz file is already in disk when you download it. It has to be unzipped and load to GPU for methylation calling.

cyber-ux commented 4 years ago

What I ask is: when I download the tar.gz file of the model how much does it occupy on disk?

PengNi commented 4 years ago

187M, the size of file can be checked in google drive.

cyber-ux commented 4 years ago

Ok! Thank you so much for helping me! Best Regards

cyber-ux commented 4 years ago

Hi Peng, I unzipped the model and it gives me a folder inside which it gives me 'bn_17.sn_360.epoch_9.ckpt.index' and 'bn_17.sn_360.epoch_9.ckpt.meta' but in the command instructions it is marked only '- model_path model.CpG.R9.4_1D.human_hx1.bn17.sn360.v0.1.7 + / bn_17.sn_360.epoch_9.ckpt '. My questions are : 1- Do I just enter 'model ..... ckpt' or do I enter 'model .... ckpt.index' or 'model .... ckpt.meta' ? I thought of inserting 'model ... ckpt.index', I think it's the right one, can you confirm? 2- What is the difference between ‘model...ckpt.index’ and ‘model...ckpt.meta’ ?

PengNi commented 4 years ago

Hi Lucio,

Please refer to #11 .

Best, Peng