Shians / NanoMethViz

https://shians.github.io/NanoMethViz/
Apache License 2.0
24 stars 1 forks source link

The mod_prob graph does not correspond with percent_modified from modkit #44

Open pubafra opened 3 months ago

pubafra commented 3 months ago

Hello, thank for this tool. I would like to report what I think is a bug. Honestly, I am not sure if I am using in right way the tool or maybe is in fact a bug.

I work with Nanopore data to identify modified bases in bacteria. I used dorado 0.7.2 with rerio models for 6mA and 5mC base calling. After mapping the bam to reference (pEC598 plasmid) I used modkit to get bedMethyl table. I am interest in specific m6A modbase of GATC motif. I created manually the annotation of exons (anno_exon.bed) table with the bed of GATC motif in the pEC598 .

gene_id chr strand  start   end transcript_id   symbol
1   pEC598  +   105 109 GATC_1  GATC_1
1   pEC598  +   137 141 GATC_2  GATC_2
1   pEC598  +   208 212 GATC_3  GATC_3
1   pEC598  +   292 296 GATC_4  GATC_4
1   pEC598  +   491 495 GATC_5  GATC_5
1   pEC598  +   1664    1668    GATC_6  GATC_6
1   pEC598  +   1739    1743    GATC_7  GATC_7
1   pEC598  +   1808    1812    GATC_8  GATC_8

I created the plot_region following:

file <- "pEC598_mapped.bam"
exons_raw <- read_tsv("anno_exon.bed", col_names = TRUE, show_col_types = TRUE)
exons_df <- as.data.frame(exons_raw)
samples <- data.frame(sample = "sample1", group = "group1")
methy <- ModBamFiles(samples = "sample1", paths = file)
mbr <- ModBamResult(methy, samples, exons_df, mod_code = "a")

plot_region(mbr, "pEC598", -10, 2200, heatmap = TRUE)

I got the graphic:

sample1

In the bedMethyl table from modkit we can find percent_modified (column 11). The next table was filtered to 6mA and strand + just to check the percent_modified for my interest regions and as you can see all are > 50%:

chrom   start_position  end_position    modified_base_code_and_motif    score   strand  start_position  end_position    color   Nvalid_cov  percent_modified    Nmod    Ncanonical  Nother_mod  Ndelete Nfail   Ndiff   Nnocall
pEC598  106 107 a   39  +   106 107 255,0,0 39  97.44   38  1   0   0   2   0   0
pEC598  138 139 a   37  +   138 139 255,0,0 37  97.30   36  1   0   0   3   0   0
pEC598  209 210 a   39  +   209 210 255,0,0 39  100.00  39  0   0   0   3   0   0
pEC598  275 276 a   27  +   275 276 255,0,0 27  88.89   24  3   0   0   15  0   0
pEC598  276 277 a   24  +   276 277 255,0,0 24  87.50   21  3   0   0   18  0   0
pEC598  281 282 a   24  +   281 282 255,0,0 24  79.17   19  5   0   0   18  0   0
pEC598  287 288 a   22  +   287 288 255,0,0 22  54.55   12  10  0   0   19  1   0
pEC598  293 294 a   40  +   293 294 255,0,0 40  100.00  40  0   0   0   2   0   0
pEC598  366 367 a   37  +   366 367 255,0,0 37  100.00  37  0   0   0   4   0   0
pEC598  492 493 a   34  +   492 493 255,0,0 34  100.00  34  0   0   0   9   0   0
pEC598  1665    1666    a   36  +   1665    1666    255,0,0 36  94.44   34  2   0   0   6   0   0
pEC598  1731    1732    a   18  +   1731    1732    255,0,0 18  77.78   14  4   0   2   22  0   0
pEC598  1740    1741    a   41  +   1740    1741    255,0,0 41  100.00  41  0   0   0   1   0   0
pEC598  1743    1744    a   27  +   1743    1744    255,0,0 27  59.26   16  11  0   0   13  2   0
pEC598  1809    1810    a   40  +   1809    1810    255,0,0 40  97.50   39  1   0   0   2   0   0
pEC598  1893    1894    a   23  +   1893    1894    255,0,0 23  73.91   17  6   0   1   18  0   0
pEC598  1899    1900    a   35  +   1899    1900    255,0,0 35  85.71   30  5   0   0   7   0   0

My question is, why the graph in mod_prob has different values to percent_modified bedMethyl ?

I would expect something like this.

sample1_expected

As a told at the beginning, maybe I am doing something wrong or I am misunderstanding the results. Perhaps both mod_prob and percent_modified are different?

Thank you

Shians commented 3 months ago

Would you be able to upload a subset of your bam file for me to check?

pubafra commented 3 months ago

Here is the bam. Thank for your time. B4_pEC598.zip