fgcz / rawDiag

Brings Orbitrap mass spectrometry data to life; multi-platform, fast and colorful R package
https://bioconductor.org/packages/rawDiag
36 stars 11 forks source link

Error: Negative length vectors are not allowed #48

Closed dzolg closed 5 years ago

dzolg commented 5 years ago

Dear RawDiag Team,

I am trying to extract scans from a RAW file. MS2 scans work, MS1 scan extraction works in general, e.g. if I subselect the first 100 scans to extract. Whenever I submit a large amount of scans (like all MS1 scans of a file), readScans returns:

Error in source(tfo) : negative length vectors are not allowed

~I suspect that one of the scans might be empty (have seen that before, but rarely). The behavior is file dependent, some run through, some don't. Are there verbose messages to find at which scan it goes wrong? If it is indeed an empty scan, can one try to catch this error?~

This seems to be a memory issue, quite a lot hits for the error. When I chunk the scans (5x1000 scans) it runs fine. So I guess the function does not scale well to ~ 5k MS1 scans (profile) or > 80k MS2 scans (these were testfiles that fail).

RawDiag 0.0.29, R 3.5.2 under 64bit Windows:

file <- "02401_Ecoli_QC_R3.raw"
metaDat <- read.raw(file, rawDiag = FALSE)
idx <- metaDat[ which(metaDat$MSOrder == "Ms"),]$scanNumber
scanDat <- readScans(file, scans = idx)

File that I am using: https://drive.google.com/open?id=1VN4U21jtg5bY10Bb9bnFEZ-mTfRMFKEY

Thanks for the support.

dzolg commented 5 years ago

Update: Seems to be a memory issue, quite a lot hits for the error. When I chunk the scans (5x1000 scans) it runs fine. So I guess the function does not scale well to ~ 5k MS1 scans (profile) or > 80k MS2 scans (these were testfiles that fail).

cpanse commented 5 years ago

@dzolg pfff; this is massive ... Can you try to apply a `Divide and Conquer' strategy? Maybe you can extract the desired information before the merging step.

here is an example

system.time(scanDat <- lapply(idx, function(x){
  scan <- readScans(file, x)[[1]]; 
  i <- scan$intensity > 0;
  scan$mZ <- scan$mZ[i];
  scan$intensity <- scan$intensity[i];  
 scan}
))

the snippet seems to work on our 64 cores Linux box:

screenshot 2019-01-23 at 14 40 20

screenshot 2019-01-23 at 14 44 00

Q.E.D.