HenrikBengtsson / affxparser

🔬 R package: This is the Bioconductor devel version of the affxparser package.
http://bioconductor.org/packages/devel/bioc/html/affxparser.html
7 stars 3 forks source link

Potential Bug in convertCDF() #23

Closed arubio2 closed 8 years ago

arubio2 commented 8 years ago

I have tried to convert a Brainarray cdf into the binary form and it raises a segfault problem. It happened both with Linux and MacOS.

library(affxparser) convertCdf("HTA_ASv3_hta20_Hs_ENSG.cdf",”HTA_ASv3_hta20_Hs_ENSG_bin.cdf”)

And immediately, * caught segfault * address (nil), cause 'memory not mapped'

Traceback: 1: .Call("R_affx_get_cdf_file_qc", filename, as.integer(units), as.integer(verbose), returnIndices, returnXY, returnLength, returnPMInfo, returnBackgroundInfo, returnType, returnQcNumbers) 2: readCdfQc(filename) 3: convertCdf("HTA_ASv3_hta20_Hs_ENSG.cdf", "HTA_ASv3_hta20_Hs_ENSG_bin.cdf")

The cdf is donwloaded from http://mbni.org/customcdf/20.0.0/ensg.download/hta20_Hs_ENSG_20.0.0.zip

The sessionInfo is R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS release 6.7 (Final)

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] affxparser_1.42.0

Best regards,

Angel

HenrikBengtsson commented 8 years ago

Before worrying about conversion, do readCdfHeader() and readCdf() work on this file?

arubio2 commented 8 years ago

readCdfHeader() -> No problem readCdf()-> Segementation fault

From: Henrik Bengtsson notifications@github.com<mailto:notifications@github.com> Reply-To: HenrikBengtsson/affxparser reply@reply.github.com<mailto:reply@reply.github.com> Date: Friday, April 1, 2016 at 17:52 To: HenrikBengtsson/affxparser affxparser@noreply.github.com<mailto:affxparser@noreply.github.com> Cc: Angel Rubio Díaz-Cordovés arubio@tecnun.es<mailto:arubio@tecnun.es> Subject: Re: [HenrikBengtsson/affxparser] Potential Bug in createCDF (#23)

Before worrying about conversion, do readCdfHeader() and readCdf() work on this file?

You are receiving this because you authored the thread. Reply to this email directly or view it on GitHubhttps://github.com/HenrikBengtsson/affxparser/issues/23#issuecomment-204619414

HenrikBengtsson commented 8 years ago

I can reproduce this on Windows as well. For the records, here're the details on this file:

> p <- "hta20_Hs_ENSG.cdf"
> pathname <- "hta20_Hs_ENSG.cdf"
> str(as.list(file.info(pathname)))
List of 7
 $ size : num 3.34e+08
 $ isdir: logi FALSE
 $ mode :Class 'octmode'  int 438
 $ mtime: POSIXct[1:1], format: "2015-11-09 11:45:52"
 $ ctime: POSIXct[1:1], format: "2016-04-01 23:22:17"
 $ atime: POSIXct[1:1], format: "2016-04-01 23:22:17"
 $ exe  : chr "no"
> digest::digest(p, file=TRUE)
[1] "54da0300ae48837bc45e7927bed45dec"

It's a text-based CDF with header:

> str(affxparser::readCdfHeader(pathname))
List of 12
 $ ncols      : int 2680
 $ nrows      : int 2572
 $ nunits     : int 35321
 $ nqcunits   : int 0
 $ refseq     : chr ""
 $ chiptype   : chr "hta20_Hs_ENSG"
 $ filename   : chr "./hta20_Hs_ENSG.cdf"
 $ rows       : int 2572
 $ cols       : int 2680
 $ probesets  : int 35321
 $ qcprobesets: int 0
 $ reference  : chr ""

It core dumps with:

> data <- readCdfUnits(pathname, units=1)
[core dump]

It also core dumps using the affyio package, e.g.

> data <- affyio::read.cdffile.list(pathname)
[core dump]

I would suspect this CDF file has an invalid format or is corrupt is some sense, because neither affxparser nor affyio can read the file and they are completely different code bases.

Related

We have seen similar problems before with CDFs of this chip type, cf. https://github.com/HenrikBengtsson/affxparser/issues/18.

HenrikBengtsson commented 8 years ago

I'll consider this a buggy CDF unless proven otherwise.