HenrikBengtsson / aroma.seq

🔬 R package: aroma.seq: High-Throughput Sequence Analysis using the Aroma Framework
https://github.com/HenrikBengtsson/aroma.seq
0 stars 1 forks source link

tests/Bowtie2Alignment,SE.R: assertion error #29

Closed HenrikBengtsson closed 8 years ago

HenrikBengtsson commented 8 years ago

Package test tests/Bowtie2Alignment,SE.R now gives an error:

Running the tests in 'tests/Bowtie2Alignment,SE.R' failed.
Last 13 lines of output:
  BamDataSet:
  Name: TopHat-example
  Tags: gz,bowtie2
  Full name: TopHat-example,gz,bowtie2
  Subpath: Lambda_phage
  Number of files: 2
  Names: reads_1, reads_2 [2]
  Path (to the first file): bamData/TopHat-example,gz,bowtie2/Lambda_phage
  Total file size: 4.66 kB (4775 bytes)
  RAM: 0.00MB
  Number of unique target sequence sets: 1
  Error: getChecksum(bamZ) == getChecksum(bam) is not TRUE
  Execution halted

with

> findBowtie2()
                                           bowtie2
"/home/shared/cbc/software/bowtie2-latest/bowtie2"
attr(,"version")
[1] '2.2.6'

> sessionInfo()
R version 3.2.2 Patched (2015-11-20 r69676)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] aroma.seq_0.6.9-9000 aroma.core_2.14.0    R.devices_2.13.1
[4] R.filesets_2.9.0     R.utils_2.1.0        R.oo_1.19.0
[7] R.methodsS3_1.7.0

loaded via a namespace (and not attached):
 [1] DNAcopy_1.44.0             XVector_0.10.0
 [3] GenomicAlignments_1.6.1    GenomicRanges_1.22.1
 [5] BiocGenerics_0.16.1        zlibbioc_1.16.0
 [7] IRanges_2.4.3              BiocParallel_1.4.0
 [9] lattice_0.20-33            R.cache_0.12.0
[11] hwriter_1.3.2              GenomeInfoDb_1.6.1
[13] globals_0.5.9-9000         tools_3.2.2
[15] grid_3.2.2                 SummarizedExperiment_1.0.1
[17] parallel_3.2.2             Biobase_2.30.0
[19] latticeExtra_0.6-26        lambda.r_1.1.7
[21] futile.logger_1.4.1        gtools_3.5.0
[23] PSCBS_0.60.0               matrixStats_0.15.0
[25] digest_0.6.8               R.rsp_0.20.0-9000
[27] RColorBrewer_1.1-2         futile.options_1.0.0
[29] bitops_1.0-6               base64enc_0.1-3
[31] S4Vectors_0.8.3            codetools_0.2-14
[33] Rsamtools_1.22.0           Biostrings_2.38.1
[35] ShortRead_1.28.0           stats4_3.2.2
[37] future_0.8.9-9000          listenv_0.5.0-9000
``
HenrikBengtsson commented 8 years ago

This is because the read group CL in the BAM header carries information of the full system call used to call bowtie2 and they differ when called with compressed and non-compress FASTQ files, e.g.

> all.equal(getHeader(bamZ), getHeader(bam))
[1] "Component \"text\": Component \"@PG\": 1 string mismatch"

> getHeader(bam)$text$`@PG`
[1] "ID:bowtie2"                                                                                    
[2] "PN:bowtie2"                                                                                    
[3] "VN:2.2.6"                                                                                      
[4] "CL:\"/home/shared/cbc/software/bowtie2-latest/bowtie2-align-s --wrapper basic-0 -x src/refIndex/lambda_virus -S reads_1.sam -U src/reads_1.fq\""

> getHeader(bamZ)$text$`@PG`
[1] "ID:bowtie2"                                                                                    
[2] "PN:bowtie2"                                                                                    
[3] "VN:2.2.6"                                                                                      
[4] "CL:\"/home/shared/cbc/software/bowtie2-latest/bowtie2-align-s --wrapper basic-0 -x src/refIndex/lambda_virus -S reads_1.sam -U /tmp/386553.unp\""
HenrikBengtsson commented 8 years ago

This is an example of a problem behind Issue #25.