cultivarium / MicrobeMod

A toolkit for exploring prokaryotic methylation and base modifications in nanopore sequencing
MIT License
34 stars 1 forks source link

Problem with mapped methylated sites #19

Open MarioRinBarr opened 5 months ago

MarioRinBarr commented 5 months ago

Hi,

I have a problem interpreting the data table mapped_methylated_sites.tsv. In the table there are numbers with the number of different bases there are for a position (modified base, unmodified bases...). I understand that if I add the 5 types of bases that appear I should get the number of reads that there are for that position in total. However, when I look at the assembly done with minimap2 I find a lot more bases in that position than the ones shown in the table. And I am talking about cases in which 5 bases appear in total, but when I check the assembly I have 10000 reads in that position. Am I interpreting the data wrong?

Thank you very much

alexcritschristoph commented 4 months ago

Hi Mario Can you show a few example lines from the table? Are all of the positions not adding up, or a subset of them?

MarioRinBarr commented 4 months ago

Sorry it took me so long to get back to you. I attach you one example of one potential methylated A position. It says there are a total of 126 reads example.csv , 29 of them A, methylated or not, and 90 that are something else. However, when I look at the alignment in Geneious, there are 6662 reads, 6476 As, 39 Cs, 57 Gs and 90 Ts in that position.

alexcritschristoph commented 4 months ago

Thanks Mario. Is this consistent across your entire reference? For example, is the average coverage of this reference ~6600 but MicrobeMod always returning ~126 coverage? Or, is this true for some positions and not others?

MarioRinBarr commented 4 months ago

Thanks for the answer. The problem happens in the entire reference. I am sending you the complete table, adding the number of As, Ts, Cs and Gs I have counted with Geneious. I have made another run with nanopore and the result is similar: all sites result in a much lower value than the assembly.

summary_number_position.csv

alexcritschristoph commented 3 months ago

Ever figure this out @MarioRinBarr ? Can you see what happens if you run modkit pileup --filter-threshold 0.6 on your bam file?

I think the degree of the discrepancy is due to something specific to your dataset here...