Bioconductor / Rsamtools

Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import
https://bioconductor.org/packages/Rsamtools
Other
27 stars 27 forks source link

Rsamtools::seqinfo() with corrupt BamFile messes up R session #3

Closed kjohnsen closed 6 years ago

kjohnsen commented 6 years ago

I'm developing a package that uses seqinfo() on a BamFileList (but I've narrowed the problem down to BamFile), and apparently it messes up the R session so that it cannot make any more system calls.

To reproduce:

bamInfo <- Rsamtools::seqinfo(Rsamtools::BamFile('corrupt.bam'))
Sys.which('hello-there')  # breaks here

Output:

error: cannot popen '/usr/bin/which 'hello-there' 2>/dev/null', probable reason 'Cannot allocate memory'

I cannot even show you sessionInfo after this has happened:

> sessionInfo()
Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) : 
  cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'

This is definitely not a RAM problem--the system has got plenty. Strangely enough, the function seems to work still: I still get the seqinfo I need. I'm not sure exactly what's wrong with the one BAM file I've been using, but this hasn't happened with any others. I can get you the one that's been causing this problem if you'd like.

mtmorgan commented 6 years ago

I think a simpler version will be scanBamHeader('corrupt.bam') but to help further yes the corrupt bam file is necessary.

kjohnsen commented 6 years ago

I have confirmed that the same problem does not happen with scanBamHeader().

And here's a link to the offending file.

kjohnsen commented 6 years ago

Here's my sessionInfo, by the way, before the problem arises:

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.6 (Santiago)

Matrix products: default
BLAS: /zapps/r/3.5.1/lib64/R/lib/libRblas.so
LAPACK: /zapps/r/3.5.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C
 [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915
 [5] LC_MONETARY=en_US.iso885915    LC_MESSAGES=en_US.iso885915
 [7] LC_PAPER=en_US.iso885915       LC_NAME=C
 [9] LC_ADDRESS=C                   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] Rsamtools_1.32.2     Biostrings_2.48.0    XVector_0.20.0
[4] GenomicRanges_1.32.6 GenomeInfoDb_1.16.0  IRanges_2.14.10
[7] S4Vectors_0.18.3     BiocGenerics_0.26.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.26.0        compiler_3.5.1         GenomeInfoDbData_1.1.0
[4] RCurl_1.95-4.11        BiocParallel_1.14.2    bitops_1.0-6

I should add that the exact same problem happens for me when running tests on Travis CI, so I don't think it's specific to this system.

mtmorgan commented 6 years ago

I have not been able to reproduce this. Can you create a file "issue-3.R"

bamInfo <- Rsamtools::seqinfo(Rsamtools::BamFile('zy14_wt_cut_filt.bam'))
Sys.which('hello-there')
sessionInfo()

Then confirm that it still fails when run from the command line

R --vanilla -f issue-3.R

and then report the output of

R --vanilla -d valgrind -f issue-3.R

(this will take some time, but not more than a minute or so)?

The only unusual thing is the locale "en_US.iso885915" versus for example Sys.setlocale("LC_ALL", "en_US.UTF-8")

kjohnsen commented 6 years ago

By running R --vanilla -f issue-3.R, I didn't get any problems, but when I ran without --vanilla, it failed the same way.

valgrind did find problems (even with --vanilla) though. It ran for several minutes until running out of memory, and produced the following:

valgrind-out.txt

mtmorgan commented 6 years ago

I guess there's an index file as well, something like zy14_wt_cut_filt.bam.bai ?

kjohnsen commented 6 years ago

Yes, there is

mtmorgan commented 6 years ago

Care to share?

kjohnsen commented 6 years ago

zy14_wt_cut_filt.bam.zip

mtmorgan commented 6 years ago

I'll work on a fix but the problem is that the index file is corrupt and should be regenerated, e.g., Rsamtools::indexBam("zy14_wt_cut_filt.bam").

kjohnsen commented 6 years ago

Ah, I see. I didn't think the index was being used. Thanks!