Closed DanielEWeeks closed 5 years ago
Hi Dan, I will look into this. When minfi updates their functions we need to update as well. I will let you know when I have a fix. Kathleen
From: Daniel E. Weeks notifications@github.com Sent: January 7, 2019 1:35 PM To: GreenwoodLab/funtooNorm funtooNorm@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [GreenwoodLab/funtooNorm] Fails on EPIC data (#44)
We'd like to apply 'funtooNorm' to some EPIC data, but the current stable version from Bioconductor (version 1.6.0) fails with this error message:
Error in minfi::getGreen(myRGChannelSet)[TypeI.Green$AddressA, ]:
subscript out of bounds.
A similar error discussed at https://support.bioconductor.org/p/97229/, where trying to analyze EPIC data led to this error:
mySampleSet <- fromRGChannelSet(RGSet_combined)
Error in minfi::getRed(myRGChannelSet)[controlTable$Address, ] :
subscript out of bounds
There Kasper Hansen suggests that:
The authors of funtooNorm should modify their code to work with subsetted objects. This is something we now fully support in minfi and they can look in minfi to see examples of how to write it.
One advantage of working with subsetted objects is to allow things like this.
Is this fixed in the development version (version 1.7.0)? Or, if not, any suggestions on how to extend/adjust funtooNorm so that it can process EPIC data?
Thank you, Dan Weeks
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/GreenwoodLab/funtooNorm/issues/44, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHmO-b63bvAogCmQxn8J2sDiHYYyulx7ks5vA5NRgaJpZM4Z0A2r.
Dear Kathleen,
Thank you for looking into this. I think a similar problem in a function in minfi was resolved early last year, as discussed in https://github.com/hansenlab/minfi/issues/95
So studying how this code was revised might be helpful:
The 'getSnpBeta' code in 'minfi' has been adjusted to avoid this subsetting error:
Previous version:
getSnpBeta <-
function (object)
{
# .isRGOrStop(object)
manifest <- getManifest(object)
snpProbesII <- getProbeInfo(object, type = "SnpII")$Name
M.II <- getGreen(object)[getProbeInfo(object, type = "SnpII")$AddressA,
, drop = FALSE]
U.II <- getRed(object)[getProbeInfo(object, type = "SnpII")$AddressA,
, drop = FALSE]
snpProbesI.Green <- getProbeInfo(object, type = "SnpI")[getProbeInfo(object,
type = "SnpI")$Color == "Grn", ]
snpProbesI.Red <- getProbeInfo(object, type = "SnpI")[getProbeInfo(object,
type = "SnpI")$Color == "Red", ]
M.I.Red <- getRed(object)[snpProbesI.Red$AddressB, , drop = FALSE]
U.I.Red <- getRed(object)[snpProbesI.Red$AddressA, , drop = FALSE]
M.I.Green <- getGreen(object)[snpProbesI.Green$AddressB,
, drop = FALSE]
U.I.Green <- getGreen(object)[snpProbesI.Green$AddressA,
, drop = FALSE]
M <- rbind(M.II, M.I.Red, M.I.Green)
U <- rbind(U.II, U.I.Red, U.I.Green)
rownames(M) <- rownames(U) <- c(snpProbesII, snpProbesI.Red$Name,
snpProbesI.Green$Name)
beta <- M/(U + M + 100)
return(beta)
}
Newer version:
getSnpBeta <- function(object) {
# Check input
.isRGOrStop(object)
# Extract and processSNP probes
snpProbesI <- getProbeInfo(object, type = "SnpI")
snpProbesII <- getProbeInfo(object, type = "SnpII")
snpProbesI.Green <- snpProbesI[snpProbesI$Color == "Grn", , drop = FALSE]
snpProbesI.Red <- snpProbesI[snpProbesI$Color == "Red", , drop = FALSE]
# Extract Red and Green channels
Green <- getGreen(object)
Red <- getRed(object)
# Construct M and U
M.II <- Green[snpProbesII$AddressA, , drop = FALSE]
U.II <- Red[snpProbesII$AddressA, , drop = FALSE]
M.I.Red <- Red[snpProbesI.Red$AddressB, , drop = FALSE]
U.I.Red <- Red[snpProbesI.Red$AddressA, , drop = FALSE]
M.I.Green <- Green[snpProbesI.Green$AddressB, , drop = FALSE]
U.I.Green <- Green[snpProbesI.Green$AddressA, , drop = FALSE]
M <- rbind(M.II, M.I.Red, M.I.Green)
U <- rbind(U.II, U.I.Red, U.I.Green)
rownames(M) <- rownames(U) <- c(
snpProbesII$Name,
snpProbesI.Red$Name,
snpProbesI.Green$Name)
# Compute beta
M / (U + M + 100)
}
Perhaps you aren't done with the fix, but I noticed your commit, so we tried it out, and it still generates a similar error:
R21sampleset = fromRGChannelSet(rgSet)
Output:
A covariance Matrix was build
Error in sigA[TypeI.Green$AddressA, ] : subscript out of bounds
Thank you.
Hi Daniel, Yes, I am still testing.
Daniel, I have just successfully tested funtooNorm on a large EPIC dataset. I would suggest upgrading your R and bioconductor to the latest version. I was using minfi version 1.28.0 Let me know if that helps. Kathleen
if you call your RGset, do you see IlluminaHumanMethylationEPIC as below?
RGset class: RGChannelSetExtended dim: 1052641 309 metadata(0): assays(5): Green Red GreenSD RedSD NBeads rownames(1052641): 1600101 1600111 ... 99810990 99810992 rowData names(0): colnames(309): 200590490035_R01C01 200590490035_R02C01 ... 200970120035_R06C01 200970120035_R07C01 colData names(10): Sample_Name Sample_Well ... filenames cell_type Annotation array: IlluminaHumanMethylationEPIC annotation: ilm10b4.hg19
Our RGset is all the same as yours except that our number of probes is smaller
class: RGChannelSetExtended
dim: 1051815 189
There is a mismatch between what is in our data and what is in the annotation, causing the fromRGChannelSet function to exit at:
sigA=sigA[TypeI.Green$AddressA,]
TypeI.Green$AddressA contains 28 addresses that are not present in sigA:
> miss.AddressA
[1] "24796887" "40756282" "97675569" "71604108" "86783537" "63637157" "70759207"
[8] "76692360" "96612879" "94744118" "9777333" "15775140" "77658838" "66756967"
[15] "87666365" "76694572" "98616263" "83666448" "4615247" "99746547" "77647953"
[22] "37668260" "45643890" "43743569" "76748380" "83784837" "81641881" "38808918"
These correspond to these probes:
> miss.cpg
[1] "cg23873430" "cg17012160" "cg07885130" "cg23322676" "cg21898755" "cg07141527"
[7] "cg00221207" "cg21485898" "cg03140487" "cg18282321" "cg12667768" "cg04151290"
[13] "cg04563419" "cg09456241" "cg09134714" "cg05667917" "cg03041463" "cg11745084"
[19] "cg17917325" "cg04918897" "cg13552506" "cg02046589" "cg10698069" "cg24834873"
[25] "cg12349676" "cg02177951" "cg13936208" "cg02158880"
Looks like at least the first three of these are listed in this file from the Genomics and Proteomics Core Facility at the Deutsches Krebsforschungszentrum:
https://www.dkfz.de/gpcf/uploads/media/170208_EPICmissingProbes.xlsx
where they found that their particular batch of EPIC chips were missing 598 probes.
Is there also a mis-match in the annotation itself?
ann.EPIC = getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
SnpI <- minfi::getProbeInfo(IlluminaHumanMethylationEPICanno.ilm10b4.hg19, type = "SnpI")
TypeI.Green <- rbind(minfi::getProbeInfo(IlluminaHumanMethylationEPICanno.ilm10b4.hg19,
type = "I-Green"),
SnpI[SnpI$Color == "Grn",])
# Are the TypeI.Green probes all in the EPIC annotation?
> dim(TypeI.Green)
[1] 49989 8
> table(TypeI.Green$Name %in% ann.EPIC@rownames)
FALSE TRUE
50 49939
Thank you.
Minor point:
In the 'fromRGChannelSet' function, the line:
pos=cbind(names(loc),as.character(GenomeInfoDb::seqnames(loc)),start(loc))
is generating a warning:
Warning message:
In cbind(names(loc), as.character(GenomeInfoDb::seqnames(loc)), :
number of rows of result is not a multiple of vector length (arg 3)
However, it appears to me that 'pos' is not used later in this function, so this line could be deleted.
In my forked version here https://github.com/DanielEWeeks/funtooNorm, I adjusted the 'fromRGChannelSet' so at to guard against the possibility that the annotation contains probes not observed in the rgSet. This function now works on my data set.
> R21sampleset = fromRGChannelSet(rgSet)
A covariance Matrix was build
Signal data loaded
Quantiles done
> R21sampleset
SampleSet object built from minfi
Data: 866150 positions and 189 samples
cell type: 1 2 3
528 quantiles
funtooNorm Normalization was not applied
Would appreciate it if you'd take a look at my changes to double-check to see if they make sense to you. Thank you.
Yes looks good. I am running Travis to check for errors and will merge. Thank you. Kathleen
Unfortunately, as far as I can tell, this patch did not make it into the version on Bioconductor, as I encountered this error again when I updated from Bioconductor. I didn't see this patch in the 'git log' of the Bioconductor version.
Thanks,
Dan
We'd like to apply 'funtooNorm' to some EPIC data, but the current stable version from Bioconductor (version 1.6.0) fails with this error message:
A similar error discussed at https://support.bioconductor.org/p/97229/, where trying to analyze EPIC data led to this error:
There Kasper Hansen suggests that:
Is this fixed in the development version (version 1.7.0)? Or, if not, any suggestions on how to extend/adjust funtooNorm so that it can process EPIC data?
Thank you, Dan Weeks