Closed ShangjinTan closed 5 years ago
Hi @ShangjinTan , I agree with you that the lack of training data of many modification detection is a major issue. I wish I can have more training data to improve DeepMod. However, no much training data is available for other types of modifications including 4mC. There are limited size of data for 4mC (<1000 motif sites); the deep learning process on it could be used to make 4mC prediction but the performance is unknown since the size of available 4mC data is small. If this is your only choice, you might have a try.
I would very much like to try DeepMod in my study in that it not only reports the genomic position of modifications with high accuracy but methylation percentage.
I think the major problem with modification detection from nanopore data is the lack of training data. As mentioned in the paper, prediction accuracy is sequence context dependent. Also, a commonly found modification in bacteria is 4mC, which is missing from DeepMod. So I wondered if you will further improve DeepMod for more sequence contexts and 4mC by producing more training data.