Closed ritma001 closed 4 years ago
Hi @ritma001 ,
Thanks for your interest. To call modifications by deepsignal, a motif sequence must be set. See --motifs
and --mod_loc
in extract
or call_mods
module for more details (deepsignal.py#278 ).
As a model-based method, deepsignal cannot predict all motifs existed. Currently we only have two trained models, one for CpG and another for GATC.
Best, Peng
Dear Peng,
Thank for your quick response. I have 2 follow-up questions:
Q1: Since I use the GATC model, should not I expect the prediction of methylated "A" at 9th position of 17-mer stretch?
Q2: Is it necessary to call the modifications with defined motif at the first place? Let me clarify that my initial goal is not to find a specific motif but it is just a sanity check of the output.
If the model is developed based on GATC motifs, I would expect to see the the motif and its methylated "A" being predicted and shown in the middle of the 17-mer stretch (9th position).
Best,
Wannisa
Dear Wannisa,
If you use the GATC model, you should also set --motifs
to GATC
and --mod_loc
to 1
, to extract only GATC kmer from fast5s.
Yes, it's necessary. Because deepsignal is a model-based model, it can't predict all motifs.
Best, Peng
Dear Peng,
Thank you for clarification. It is clear to me now.
Hi there,
I recently use Deepsignal to detect DNA methylation and received the result as seen below.
This is the result from E. coli genome sequencing and I used
.ckpt
from model.GATC.R9_2D.tem.puc19.bn17.sn360.tar.gz for --model_path.From this output snippet, I understand that the last two rows show methylated C at 9th position of 17 mers. I also confirmed the predicted methylation position with the genome position giving in the 2nd column.
However, I observe G_A_TC, which is a recognition motif for one of the DNA methyltransferase (methylated nucleotide in this motif is flanked by _ ), not in the overlapping regions with the predicted nucleotide methylation. Interestingly, the motif usually appears elsewhere in 17 mers.
I also checked for other recognition motifs i.e. A_A_CGTCG, CC[A/T]GG and ATGC_A_T of the different DNA methyltransferases as mentioned in PMC4231299. None of expected methylated nucleotides (flanked by _ ) in these motifs are found in the predicted methylation (9th nucleotide in 17 mers). But I found these motifs frequently pop up at various positions in the 17 mers.
So, I doubt how reliable the prediction is and if I interpret the result correctly. It is quite a detailed question and I would be glad to receive any feedback.
Best,
Wannisa