Shians / NanoMethViz

Apache License 2.0
21 stars 2 forks source link

DeepSignal #30

Open ishamada opened 1 year ago

ishamada commented 1 year ago

I want to know if NanoMethViz supports DeepSignal (Methylation calling) output

Shians commented 1 year ago

It currently doesn't support DeepSignal, mainly because I didn't think that DeepSignal's usage of single-read fast5s was viable for the future.

But since it's still being updated and clearly has users, if you can provide a small example of what DeepSignal output looks like, I can potentially implement it.

ishamada commented 1 year ago

Well, this is the output of DeepSignal2 from its command of call modification in tsv file : deepsignal2 call_mods --input_path fast5s/ --model_path model.dp2.CG.R9.4_1D.human_hx1.bn17_sn16.both_bilstm.b17_s16_epoch4.ckpt --result_file fast5s.CG.call_mods.tsv --corrected_group RawGenomeCorrected_000 --motifs CG --nproc 10

image The modification_call file is a tab-delimited text file in the following format:

  1. chrom: the chromosome name
  2. pos: 0-based position of the targeted base in the chromosome
  3. strand: +/-, the aligned strand of the read to the reference
  4. pos_in_strand: 0-based position of the targeted base in the aligned strand of the chromosome (legacy column, not necessary for downstream analysis)
  5. readname: the read name
  6. read_strand: t/c, template or complement
  7. prob_0: [0, 1], the probability of the targeted base predicted as 0 (unmethylated)
  8. prob_1: [0, 1], the probability of the targeted base predicted as 1 (methylated)
  9. called_label: 0/1, unmethylated/methylated
  10. k_mer: the kmer around the targeted base

=============================================================================== and the command of modification frequency : python /path/to/deepsignal2/scripts/call_modification_frequency.py --input_path fast5s.CG.call_mods.tsv --result_file fast5s.CG.call_mods.frequency.tsv

And this is the output for the modification_frequency file, it can be either saved in [bedMethyl] format (by setting --bed as above) or saved as a tab-delimited text file

image The modification_frequency file is a tab-delimited text or bed file in the following format:

  1. chrom: the chromosome name
  2. pos: 0-based position of the targeted base in the chromosome
  3. strand: +/-, the aligned strand of the read to the reference
  4. pos_in_strand: 0-based position of the targeted base in the aligned strand of the chromosome (legacy column, not necessary for downstream analysis)
  5. prob_0_sum: sum of the probabilities of the targeted base predicted as 0 (unmethylated)
  6. prob_1_sum: sum of the probabilities of the targeted base predicted as 1 (methylated)
  7. count_modified: number of reads in which the targeted base counted as modified
  8. count_unmodified: number of reads in which the targeted base counted as unmodified
  9. coverage: number of reads aligned to the targeted base
  10. modification_frequency: modification frequency
  11. k_mer: the kmer around the targeted base
Shians commented 1 year ago

Thanks for that, sorry for the late response, I didn't seem to get a notification of your reply. The format looks very much like it is suitable for conversion, I'll see if I can implement something next week.