HenrikBengtsson / affxparser

🔬 R package: This is the Bioconductor devel version of the affxparser package.
http://bioconductor.org/packages/devel/bioc/html/affxparser.html
7 stars 3 forks source link

CORE DUMP: readCelHeader() can cause R to core dump #16

Closed HenrikBengtsson closed 9 years ago

HenrikBengtsson commented 9 years ago

@benilton shared a problematic CEL file that causes R and affxparser to core dump:

> library("affxparser")
> pathname <- "rawData/affxparser,problematic/GenomeWideSNP_6/MC45.CEL"
> file.info(pathname)$size
[1] 69102403
> digest::digest(file=pathname)
[1] "0f7513b00a92191092c085182bdc754a"

The short story is that it is readCelHeader() that core dumps, but not readCel() - at least as far at it seems.

readCelHeader() core dump

>  hdr <- readCelHeader(pathname)
terminate called after throwing an instance of
'affymetrix_calvin_exceptions::FileNotOpenException'
This application has requested the Runtime to terminate
it in an unusual way.  Please contact the application's
support team for more information.

Likewise, readCelRectangle() and readCelIntensities() core dumps because they're both use readCelHeader() internally.

readCel() gives an error (no core dump)

If one uses readCel(), with .checkArgs=FALSE to avoid calling readCelHeader(), one only gets an error:

> data <- readCel(pathname, .checkArgs=FALSE)
Error in readCel(pathname, .checkArgs = FALSE) :
  Unable to read file: rawData/affxparser,problematic/GenomeWideSNP_6/MC45.CEL
> traceback()
2: .Call("R_affx_get_cel_file", filename, readHeader, readIntensities,
       readXY, readXY, readPixels, readStdvs, readOutliers, readMasked,
       indices, as.integer(verbose), PACKAGE = "affxparser")
1: readCel(pathname, .checkArgs = FALSE)

NOTE: This suggests the CEL file is correct/truncated.

readCelUnits() gives an error (no core dump)

Again, some tricks are needed to avoid calling readCelHeader() internally:

> pathnameCDF <- "annotationData/chipTypes/GenomeWideSNP_6/GenomeWideSNP_6.cdf"
> cdf <- readCdfUnits(pathnameCDF, readIndices=TRUE, units=1:10)
> data <- readCelUnits(pathname, cdf=cdf, verbose=TRUE)
Reordering cell indices to optimize speed...
Reordering cell indices to optimize speed...done
Reading 10*24.6=246 cells from 1 CEL files...
 Reading CEL data for array #1...
attempting to read: rawData/affxparser,problematic/GenomeWideSNP_6/MC45.CEL
Error in readCel(filename, indices = indices, readHeader = FALSE, readOutliers = FALSE,  :
  Unable to read file: rawData/affxparser,problematic/GenomeWideSNP_6/MC45.CEL

Alternatives that parses file using R (not Fusion SDK) gives error (no core dump)

> hdr <- readCcgHeader(pathname)
> names(hdr)
[1] "filename"   "fileHeader" "dataHeader"

> data <- readCcg(pathname)
Error in readBin(con, what = raw(), n = 2 * nchars) :
  invalid 'n' argumentdata <- readCcg(pathname)
> traceback()
5: readBin(con, what = raw(), n = 2 * nchars)
4: readWString(con)
3: .readCcgDataSet(con, fileOffset = offset)
2: .readCcgDataGroups(con, .filter = .filter$data, .fileHeader = fhdr)
1: readCcg(pathname, verbose = 10)

Session info

> sessionInfo()
R version 3.2.0 Patched (2015-05-02 r68310)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] affxparser_1.41.0-9000

loaded via a namespace (and not attached):
[1] tools_3.2.0       R.methodsS3_1.7.0 R.utils_2.0.2     R.oo_1.19.0
HenrikBengtsson commented 9 years ago

ACTION: R / affxparser should never core dump, so the internal Fusion SDK exception (native C++ code) should at least bubble up to the R level as an error message. Never a core dump.

kasperdanielhansen commented 9 years ago

Totally agree, but to fix this we need to learn enough about C++ exceptions to catch this. Of course this is possible; just a matter of how much time it takes to learn.

On Tue, May 5, 2015 at 7:45 PM, Henrik Bengtsson notifications@github.com wrote:

ACTION: R / affxparser should never core dump, so the internal Fusion SDK exception (native C++ code) should at least bubble up to the R level as an error message. Never a core dump.

— Reply to this email directly or view it on GitHub https://github.com/HenrikBengtsson/affxparser/issues/16#issuecomment-99263079 .

HenrikBengtsson commented 9 years ago

I think it's just a matter of using something like:

try {  
   ...  
} catch(FileNotOpenException ex) {  
   // generate R error.  
}

The class: FileNotOpenException

HenrikBengtsson commented 9 years ago

affxparser 1.41.2 available on Bioc devel no longer core dumps; affxparser now catches Fusion SDK C++ exceptions and reports them up as standard R errors instead, e.g.

> library("affxparser")
> pathname <- "rawData/affxparser,problematic/GenomeWideSNP_6/MC45.CEL"

> hdr <- readCelHeader(pathname)
Error in readCelHeader(pathname) :
  [affxparser Fusion SDK exception] Failed to parse header of CEL file: rawData/
affxparser,problematic/GenomeWideSNP_6/MC45.CEL

> data <- readCel(pathname)
Error in readCelHeader(filename) :
  [affxparser Fusion SDK exception] Failed to parse header of CEL file: rawData/
affxparser,problematic/GenomeWideSNP_6/MC45.CEL

FYI, @benilton (in case you don't get notification on this issue)