GreenwoodLab / funtooNorm

6 stars 7 forks source link

Fails on EPIC data #44

Closed DanielEWeeks closed 5 years ago

DanielEWeeks commented 5 years ago

We'd like to apply 'funtooNorm' to some EPIC data, but the current stable version from Bioconductor (version 1.6.0) fails with this error message:

Error in minfi::getGreen(myRGChannelSet)[TypeI.Green$AddressA, ]:
    subscript out of bounds.

A similar error discussed at https://support.bioconductor.org/p/97229/, where trying to analyze EPIC data led to this error:

mySampleSet <- fromRGChannelSet(RGSet_combined)
Error in minfi::getRed(myRGChannelSet)[controlTable$Address, ] : 
  subscript out of bounds

There Kasper Hansen suggests that:

The authors of funtooNorm should modify their code to work with subsetted objects. This is something we now fully support in minfi and they can look in minfi to see examples of how to write it.

One advantage of working with subsetted objects is to allow things like this.

Is this fixed in the development version (version 1.7.0)? Or, if not, any suggestions on how to extend/adjust funtooNorm so that it can process EPIC data?

Thank you, Dan Weeks

kkleinoros commented 5 years ago

Hi Dan, I will look into this. When minfi updates their functions we need to update as well. I will let you know when I have a fix. Kathleen

From: Daniel E. Weeks notifications@github.com Sent: January 7, 2019 1:35 PM To: GreenwoodLab/funtooNorm funtooNorm@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [GreenwoodLab/funtooNorm] Fails on EPIC data (#44)

We'd like to apply 'funtooNorm' to some EPIC data, but the current stable version from Bioconductor (version 1.6.0) fails with this error message:

Error in minfi::getGreen(myRGChannelSet)[TypeI.Green$AddressA, ]:

subscript out of bounds.

A similar error discussed at https://support.bioconductor.org/p/97229/, where trying to analyze EPIC data led to this error:

mySampleSet <- fromRGChannelSet(RGSet_combined)

Error in minfi::getRed(myRGChannelSet)[controlTable$Address, ] :

subscript out of bounds

There Kasper Hansen suggests that:

The authors of funtooNorm should modify their code to work with subsetted objects. This is something we now fully support in minfi and they can look in minfi to see examples of how to write it.

One advantage of working with subsetted objects is to allow things like this.

Is this fixed in the development version (version 1.7.0)? Or, if not, any suggestions on how to extend/adjust funtooNorm so that it can process EPIC data?

Thank you, Dan Weeks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/GreenwoodLab/funtooNorm/issues/44, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AHmO-b63bvAogCmQxn8J2sDiHYYyulx7ks5vA5NRgaJpZM4Z0A2r.

DanielEWeeks commented 5 years ago

Dear Kathleen,

Thank you for looking into this. I think a similar problem in a function in minfi was resolved early last year, as discussed in https://github.com/hansenlab/minfi/issues/95

So studying how this code was revised might be helpful:

The 'getSnpBeta' code in 'minfi' has been adjusted to avoid this subsetting error:

Previous version:

getSnpBeta <-
function (object)
{
#    .isRGOrStop(object)
    manifest <- getManifest(object)
    snpProbesII <- getProbeInfo(object, type = "SnpII")$Name
    M.II <- getGreen(object)[getProbeInfo(object, type = "SnpII")$AddressA,
        , drop = FALSE]
    U.II <- getRed(object)[getProbeInfo(object, type = "SnpII")$AddressA,
        , drop = FALSE]
    snpProbesI.Green <- getProbeInfo(object, type = "SnpI")[getProbeInfo(object,
        type = "SnpI")$Color == "Grn", ]
    snpProbesI.Red <- getProbeInfo(object, type = "SnpI")[getProbeInfo(object,
        type = "SnpI")$Color == "Red", ]
    M.I.Red <- getRed(object)[snpProbesI.Red$AddressB, , drop = FALSE]
    U.I.Red <- getRed(object)[snpProbesI.Red$AddressA, , drop = FALSE]
    M.I.Green <- getGreen(object)[snpProbesI.Green$AddressB,
        , drop = FALSE]
    U.I.Green <- getGreen(object)[snpProbesI.Green$AddressA,
        , drop = FALSE]
    M <- rbind(M.II, M.I.Red, M.I.Green)
    U <- rbind(U.II, U.I.Red, U.I.Green)
    rownames(M) <- rownames(U) <- c(snpProbesII, snpProbesI.Red$Name,
        snpProbesI.Green$Name)
    beta <- M/(U + M + 100)
    return(beta)
}

Newer version:

getSnpBeta <- function(object) {
    # Check input
    .isRGOrStop(object)

    # Extract and processSNP probes
    snpProbesI <- getProbeInfo(object, type = "SnpI")
    snpProbesII <- getProbeInfo(object, type = "SnpII")
    snpProbesI.Green <- snpProbesI[snpProbesI$Color == "Grn", , drop = FALSE]
    snpProbesI.Red <- snpProbesI[snpProbesI$Color == "Red", , drop = FALSE]

    # Extract Red and Green channels
    Green <- getGreen(object)
    Red <- getRed(object)

    # Construct M and U
    M.II <- Green[snpProbesII$AddressA, , drop = FALSE]
    U.II <- Red[snpProbesII$AddressA, , drop = FALSE]
    M.I.Red <- Red[snpProbesI.Red$AddressB, , drop = FALSE]
    U.I.Red <- Red[snpProbesI.Red$AddressA, , drop = FALSE]
    M.I.Green <- Green[snpProbesI.Green$AddressB, , drop = FALSE]
    U.I.Green <- Green[snpProbesI.Green$AddressA, , drop = FALSE]
    M <- rbind(M.II, M.I.Red, M.I.Green)
    U <- rbind(U.II, U.I.Red, U.I.Green)
    rownames(M) <- rownames(U) <- c(
        snpProbesII$Name,
        snpProbesI.Red$Name,
        snpProbesI.Green$Name)

    # Compute beta
    M / (U + M + 100)
}
DanielEWeeks commented 5 years ago

Perhaps you aren't done with the fix, but I noticed your commit, so we tried it out, and it still generates a similar error:

R21sampleset = fromRGChannelSet(rgSet)

Output:

A covariance Matrix was build
Error in sigA[TypeI.Green$AddressA, ] : subscript out of bounds

Thank you.

kkleinoros commented 5 years ago

Hi Daniel, Yes, I am still testing.

kkleinoros commented 5 years ago

Daniel, I have just successfully tested funtooNorm on a large EPIC dataset. I would suggest upgrading your R and bioconductor to the latest version. I was using minfi version 1.28.0 Let me know if that helps. Kathleen

kkleinoros commented 5 years ago

if you call your RGset, do you see IlluminaHumanMethylationEPIC as below?

RGset class: RGChannelSetExtended dim: 1052641 309 metadata(0): assays(5): Green Red GreenSD RedSD NBeads rownames(1052641): 1600101 1600111 ... 99810990 99810992 rowData names(0): colnames(309): 200590490035_R01C01 200590490035_R02C01 ... 200970120035_R06C01 200970120035_R07C01 colData names(10): Sample_Name Sample_Well ... filenames cell_type Annotation array: IlluminaHumanMethylationEPIC annotation: ilm10b4.hg19

DanielEWeeks commented 5 years ago

Our RGset is all the same as yours except that our number of probes is smaller

class: RGChannelSetExtended 
dim: 1051815 189 

There is a mismatch between what is in our data and what is in the annotation, causing the fromRGChannelSet function to exit at:

sigA=sigA[TypeI.Green$AddressA,]

TypeI.Green$AddressA contains 28 addresses that are not present in sigA:

> miss.AddressA
 [1] "24796887" "40756282" "97675569" "71604108" "86783537" "63637157" "70759207"
 [8] "76692360" "96612879" "94744118" "9777333"  "15775140" "77658838" "66756967"
[15] "87666365" "76694572" "98616263" "83666448" "4615247"  "99746547" "77647953"
[22] "37668260" "45643890" "43743569" "76748380" "83784837" "81641881" "38808918"

These correspond to these probes:

> miss.cpg
 [1] "cg23873430" "cg17012160" "cg07885130" "cg23322676" "cg21898755" "cg07141527"
 [7] "cg00221207" "cg21485898" "cg03140487" "cg18282321" "cg12667768" "cg04151290"
[13] "cg04563419" "cg09456241" "cg09134714" "cg05667917" "cg03041463" "cg11745084"
[19] "cg17917325" "cg04918897" "cg13552506" "cg02046589" "cg10698069" "cg24834873"
[25] "cg12349676" "cg02177951" "cg13936208" "cg02158880"

Looks like at least the first three of these are listed in this file from the Genomics and Proteomics Core Facility at the Deutsches Krebsforschungszentrum:

https://www.dkfz.de/gpcf/uploads/media/170208_EPICmissingProbes.xlsx

where they found that their particular batch of EPIC chips were missing 598 probes.

Is there also a mis-match in the annotation itself?

ann.EPIC = getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
SnpI <- minfi::getProbeInfo(IlluminaHumanMethylationEPICanno.ilm10b4.hg19, type = "SnpI")
TypeI.Green <- rbind(minfi::getProbeInfo(IlluminaHumanMethylationEPICanno.ilm10b4.hg19,
type = "I-Green"),
SnpI[SnpI$Color == "Grn",])

# Are the TypeI.Green probes all in the EPIC annotation?
> dim(TypeI.Green)
[1] 49989     8
> table(TypeI.Green$Name %in% ann.EPIC@rownames)

FALSE  TRUE 
   50 49939 

Thank you.

DanielEWeeks commented 5 years ago

Minor point:

In the 'fromRGChannelSet' function, the line:

    pos=cbind(names(loc),as.character(GenomeInfoDb::seqnames(loc)),start(loc))

is generating a warning:

Warning message:
In cbind(names(loc), as.character(GenomeInfoDb::seqnames(loc)),  :
  number of rows of result is not a multiple of vector length (arg 3)

However, it appears to me that 'pos' is not used later in this function, so this line could be deleted.

DanielEWeeks commented 5 years ago

In my forked version here https://github.com/DanielEWeeks/funtooNorm, I adjusted the 'fromRGChannelSet' so at to guard against the possibility that the annotation contains probes not observed in the rgSet. This function now works on my data set.

> R21sampleset = fromRGChannelSet(rgSet)
A covariance Matrix was build
Signal data loaded
Quantiles done
> R21sampleset
SampleSet object built from  minfi 
Data:  866150 positions and  189 samples 
   cell type: 1 2 3 
   528 quantiles 
funtooNorm Normalization was not applied 

Would appreciate it if you'd take a look at my changes to double-check to see if they make sense to you. Thank you.

kkleinoros commented 5 years ago

Yes looks good. I am running Travis to check for errors and will merge. Thank you. Kathleen

DanielEWeeks commented 3 years ago

Unfortunately, as far as I can tell, this patch did not make it into the version on Bioconductor, as I encountered this error again when I updated from Bioconductor. I didn't see this patch in the 'git log' of the Bioconductor version.
Thanks, Dan