Oshlack / missMethyl

Bioconductor package for analysis of methylation data from Illumina's Infinium HumanMethylation arrays.
10 stars 4 forks source link

Fix getMappedEntrezIDs: use only `all.cpgs` to calculate equiv #8

Closed samabs closed 4 years ago

samabs commented 4 years ago

Hi,

I believe there is a bug in the getMappedEntrezIDs function which means that equiv is calculated using all CpGs in the array (and not only those contained in all.cpg). This is in contrast to freq - which doesn't account for multimapping CpGs but does use only those CpGs defined in all.cpg.

As I understand it, equiv should always be equal to or less than freq. Without the fix we get this:

library(tidyverse)

set.seed(1)

anno = missMethyl:::.getFlatAnnotation("EPIC")

all.cpg = sample(anno$cpg, size = 500000, replace = FALSE)
sig.cpg = sample(all.cpgs, size = 10000, replace = FALSE)

out = missMethyl::getMappedEntrezIDs(sig.cpg, all.cpg)

qplot(out$equiv, out$freq) +
  geom_abline(intercept = 1)

equiv_vs_freq

sum(out$equiv > out$freq)

gives us 15653 CpGs with higher equiv values than freq

With the fix (which is only a few lines). we get:

equiv_vs_freq_fix

I believe this bug has an impact in your gsameth function since you're calculating the bias using all the CpGs in the array and not only those in all.cpg.