HenrikBengtsson / illuminaio

🔬 R package: This is the Bioconductor devel version of the illuminaio package.
http://bioconductor.org/packages/devel/bioc/html/illuminaio.html
6 stars 2 forks source link

readBPM(): Hard-coded seek() jump?!? #6

Closed HenrikBengtsson closed 8 years ago

HenrikBengtsson commented 8 years ago

This is related to Issue #5. I just looked at the code for readBPM() and there is piece of code that looks weird and very unsafe to me:

    if (FALSE) {
        snpIndexByteOffset <- seek(con)
        snpIndex <- readInt(con, n=nEntries)
        ## for the 1M array, these are simply in order from 1 to 1072820.

        snpNamesByteOffset <- seek(con)
        snpNames <- rep("A", times=nEntries)
        for(i1 in 1:nEntries){
            snpNames[i1] <- readString(con)
        }
    }

    seek(con, where=15278138L)

That seek() is particularly scary to me.

Does any one know where this originates from and the rationale behind it? I don't know who wrote readBPM() in the first place.

HenrikBengtsson commented 8 years ago

Hmm... looking at humanmethylation450_15017482_v-1-2.bpm, a BPM file is more or less comma-separated plain-text data, whereas the code suggests it is binary. Also, from code inspection I'm pretty sure readBPM() is not doing anything useful. I would declare it non-functional and should not be used by anyone.

I'm leaving it at this for now.

@kasperdanielhansen, comment?

kasperdanielhansen commented 8 years ago

I don't know who wrote it either and I don't think it is really supported. Also, in my experience, Illumina always releases CSV versions of their BPM files on their websites. But I am mostly familiar with their methylation arrays.

Best, Kasper

On Tue, Jan 12, 2016 at 2:28 PM, Henrik Bengtsson notifications@github.com wrote:

Hmm... looking at humanmethylation450_15017482_v-1-2.bpm, a BPM file is more or less comma-separated plain-text data, whereas the code suggests it is binary. Also, from code inspection I'm pretty sure readBPM() is not doing anything useful. I would declare it non-functional and should not be used by anyone.

I'm leaving it at this for now.

@kasperdanielhansen https://github.com/kasperdanielhansen, comment?

— Reply to this email directly or view it on GitHub https://github.com/HenrikBengtsson/illuminaio/issues/6#issuecomment-171025316 .

ttriche commented 8 years ago

Keith Baggerly wrote it initially

--t

On Tue, Jan 12, 2016 at 10:49 AM, Henrik Bengtsson <notifications@github.com

wrote:

This is related to Issue #5 https://github.com/HenrikBengtsson/illuminaio/issues/5. I just looked at the code for readBPM() and there is piece of code https://github.com/HenrikBengtsson/illuminaio/blob/develop/R/readBPM.R#L76-L88 that looks weird and very unsafe to me:

if (FALSE) {
    snpIndexByteOffset <- seek(con)
    snpIndex <- readInt(con, n=nEntries)
    ## for the 1M array, these are simply in order from 1 to 1072820.

    snpNamesByteOffset <- seek(con)
    snpNames <- rep("A", times=nEntries)
    for(i1 in 1:nEntries){
        snpNames[i1] <- readString(con)
    }
}

seek(con, where=15278138L)

That seek() is particularly scary to me.

Does any one know where this originates from and the rationale behind it? I don't know who wrote readBPM() in the first place.

— Reply to this email directly or view it on GitHub https://github.com/HenrikBengtsson/illuminaio/issues/6.

ttriche commented 8 years ago

oh, readBPM not readIDAT.. agreed

BPMs are a horrid little wrapper around a CSV

--t

On Tue, Jan 12, 2016 at 11:28 AM, Henrik Bengtsson <notifications@github.com

wrote:

Hmm... looking at humanmethylation450_15017482_v-1-2.bpm, a BPM file is more or less comma-separated plain-text data, whereas the code suggests it is binary. Also, from code inspection I'm pretty sure readBPM() is not doing anything useful. I would declare it non-functional and should not be used by anyone.

I'm leaving it at this for now.

@kasperdanielhansen https://github.com/kasperdanielhansen, comment?

— Reply to this email directly or view it on GitHub https://github.com/HenrikBengtsson/illuminaio/issues/6#issuecomment-171025316 .

ttriche commented 8 years ago

I haven't seen an example lately where they didn't provide a CSV, and in any event, a few quick ninja edits in vim turns a BPM into a CSV

--t

On Tue, Jan 12, 2016 at 11:39 AM, Kasper Daniel Hansen < notifications@github.com> wrote:

I don't know who wrote it either and I don't think it is really supported. Also, in my experience, Illumina always releases CSV versions of their BPM files on their websites. But I am mostly familiar with their methylation arrays.

Best, Kasper

On Tue, Jan 12, 2016 at 2:28 PM, Henrik Bengtsson < notifications@github.com> wrote:

Hmm... looking at humanmethylation450_15017482_v-1-2.bpm, a BPM file is more or less comma-separated plain-text data, whereas the code suggests it is binary. Also, from code inspection I'm pretty sure readBPM() is not doing anything useful. I would declare it non-functional and should not be used by anyone.

I'm leaving it at this for now.

@kasperdanielhansen https://github.com/kasperdanielhansen, comment?

— Reply to this email directly or view it on GitHub < https://github.com/HenrikBengtsson/illuminaio/issues/6#issuecomment-171025316

.

— Reply to this email directly or view it on GitHub https://github.com/HenrikBengtsson/illuminaio/issues/6#issuecomment-171029486 .

HenrikBengtsson commented 8 years ago

I've deprecated readBPM() in the GitHub develop branch. If there's time, it could be brought back later, but as I see it, no one should use this function.

Do you agree? If so, I merge it into the master branch and then push it to Bioc devel.

ttriche commented 8 years ago

+1 from me, Kasper will have to intone on his end

kasperdanielhansen commented 8 years ago

Agreed. I have never used it and I'm not really willing to support it. And as far as I can see if is unnecessary.

Best, Kasper (Sent from my phone.)

On Jan 12, 2016, at 17:19, Tim Triche, Jr. notifications@github.com wrote:

+1 from me, Kasper will have to intone on his end

— Reply to this email directly or view it on GitHub.

HenrikBengtsson commented 8 years ago

illuminaio 0.13.1 pushed to Bioc devel.