DoaneAS / rseqc

Automatically exported from code.google.com/p/rseqc
0 stars 0 forks source link

fails on ion torrent files ? or some other problem ? #17

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. install python 2.7 in debian squeeze via pythonbrewer
   install RSeQC v 2.3.1-2.3.4 in custom location 
2. run bam_stat.py on a bam file from ion torrent
3.

What is the expected output? What do you see instead?
karl@probook-debian:~/Lab/NGS_Unit/Runs/2013-01-10 RNA Seq/one$ bam_stat.py -i 
R_2013_01_09_10_33_23_user_SN1-34_Auto_user_SN1-34_33.bam
Traceback (most recent call last):
  File "/home/karl/Lab/NGS_Unit/Software/rnaseq/rseqc/RSeQC/home/karl/.pythonbrew/pythons/Python-2.7/bin/bam_stat.py", line 66, in <module>
    main()
  File "/home/karl/Lab/NGS_Unit/Software/rnaseq/rseqc/RSeQC/home/karl/.pythonbrew/pythons/Python-2.7/bin/bam_stat.py", line 61, in main
    obj = SAM.ParseBAM(options.input_file)
  File "/home/karl/Lab/NGS_Unit/Software/rnaseq/rseqc/RSeQC/home/karl/.pythonbrew/pythons/Python-2.7/lib/python2.7/site-packages/qcmodule/SAM.py", line 2418, in __init__
    self.samfile = pysam.Samfile(inputFile,'r')
  File "csamtools.pyx", line 597, in csamtools.Samfile.__cinit__ (pysam/csamtools.c:6475)
  File "csamtools.pyx", line 760, in csamtools.Samfile._open (pysam/csamtools.c:8203)
ValueError: file header is empty (mode='r') - is it SAM/BAM format?
karl@probook-debian:~/Lab/NGS_Unit/Runs/2013-01-10 RNA Seq/one$ 

What version of the product are you using? On what operating system?
tried versions 2.3.1-2.3.4 on debian squeeze using python 2.7 installed locally 
via pythonbrewer

Please provide any additional information below.
The bam file seems to be valid:
karl@probook-debian:~/Lab/NGS_Unit/Runs/2013-01-10 RNA Seq/one$ samtools view 
-H R_2013_01_09_10_33_23_user_SN1-34_Auto_user_SN1-34_33.bam
@HD VN:1.0  SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chrX LN:155270560
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210
@SQ SN:chr18    LN:78077248
@SQ SN:chr20    LN:63025520
@SQ SN:chrY LN:59373566
@SQ SN:chr19    LN:59128983
@SQ SN:chr22    LN:51304566
@SQ SN:chr21    LN:48129895
@SQ SN:chr6_ssto_hap7   LN:4928567
@SQ SN:chr6_mcf_hap5    LN:4833398
@SQ SN:chr6_cox_hap2    LN:4795371
@SQ SN:chr6_mann_hap4   LN:4683263
@SQ SN:chr6_apd_hap1    LN:4622290
@SQ SN:chr6_qbl_hap6    LN:4611984
@SQ SN:chr6_dbb_hap3    LN:4610396
@SQ SN:chr17_ctg5_hap1  LN:1680828
@SQ SN:chr4_ctg9_hap1   LN:590426
@SQ SN:chr1_gl000192_random LN:547496
@SQ SN:chrUn_gl000225   LN:211173
@SQ SN:chr4_gl000194_random LN:191469
@SQ SN:chr4_gl000193_random LN:189789
@SQ SN:chr9_gl000200_random LN:187035
@SQ SN:chrUn_gl000222   LN:186861
@SQ SN:chrUn_gl000212   LN:186858
@SQ SN:chr7_gl000195_random LN:182896
@SQ SN:chrUn_gl000223   LN:180455
@SQ SN:chrUn_gl000224   LN:179693
@SQ SN:chrUn_gl000219   LN:179198
@SQ SN:chr17_gl000205_random    LN:174588
@SQ SN:chrUn_gl000215   LN:172545
@SQ SN:chrUn_gl000216   LN:172294
@SQ SN:chrUn_gl000217   LN:172149
@SQ SN:chr9_gl000199_random LN:169874
@SQ SN:chrUn_gl000211   LN:166566
@SQ SN:chrUn_gl000213   LN:164239
@SQ SN:chrUn_gl000220   LN:161802
@SQ SN:chrUn_gl000218   LN:161147
@SQ SN:chr19_gl000209_random    LN:159169
@SQ SN:chrUn_gl000221   LN:155397
@SQ SN:chrUn_gl000214   LN:137718
@SQ SN:chrUn_gl000228   LN:129120
@SQ SN:chrUn_gl000227   LN:128374
@SQ SN:chr1_gl000191_random LN:106433
@SQ SN:chr19_gl000208_random    LN:92689
@SQ SN:chr9_gl000198_random LN:90085
@SQ SN:chr17_gl000204_random    LN:81310
@SQ SN:chrUn_gl000233   LN:45941
@SQ SN:chrUn_gl000237   LN:45867
@SQ SN:chrUn_gl000230   LN:43691
@SQ SN:chrUn_gl000242   LN:43523
@SQ SN:chrUn_gl000243   LN:43341
@SQ SN:chrUn_gl000241   LN:42152
@SQ SN:chrUn_gl000236   LN:41934
@SQ SN:chrUn_gl000240   LN:41933
@SQ SN:chr17_gl000206_random    LN:41001
@SQ SN:chrUn_gl000232   LN:40652
@SQ SN:chrUn_gl000234   LN:40531
@SQ SN:chr11_gl000202_random    LN:40103
@SQ SN:chrUn_gl000238   LN:39939
@SQ SN:chrUn_gl000244   LN:39929
@SQ SN:chrUn_gl000248   LN:39786
@SQ SN:chr8_gl000196_random LN:38914
@SQ SN:chrUn_gl000249   LN:38502
@SQ SN:chrUn_gl000246   LN:38154
@SQ SN:chr17_gl000203_random    LN:37498
@SQ SN:chr8_gl000197_random LN:37175
@SQ SN:chrUn_gl000245   LN:36651
@SQ SN:chrUn_gl000247   LN:36422
@SQ SN:chr9_gl000201_random LN:36148
@SQ SN:chrUn_gl000235   LN:34474
@SQ SN:chrUn_gl000239   LN:33824
@SQ SN:chr21_gl000210_random    LN:27682
@SQ SN:chrUn_gl000231   LN:27386
@SQ SN:chrUn_gl000229   LN:19913
@SQ SN:chrM LN:16571
@SQ SN:chrUn_gl000226   LN:15008
@SQ SN:chr18_gl000207_random    LN:4262
@RG ID:UFAT8    PL:IONTORRENT   PU:PGM/314R LB:hg19 FO:TACGTACGTCTGAGCATCGATCGATGTACA
GCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAG
CATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACA
GCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAG
CATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACA
GCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAG
CATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGA  DT:2013-0
1-09T13:55:02+0100  PG:tmap SM:14704-1_one  KS:TCAG CN:TorrentServer/sn11c110730
@PG ID:bc   PN:BaseCaller   VN:3.4-7/48562  CL:BaseCaller --trim-qual-cutoff 15 
--trim-qual-window-size 30 --trim-adapter-cutoff 16 --calibration-file 
basecaller_results/recalibration/flowQVtable.txt --phase-estimation-file 
basecaller_results/recalibration/BaseCaller.json --input-dir=sigproc_results 
--librarykey=TCAG --tfkey=ATCG --run-id=UFAT8 --output-dir=basecaller_results 
--block-col-offset 0 --block-row-offset 0 
--datasets=basecaller_results/datasets_pipeline.json --trim-adapter 
ATCACCGACTGCCCATAGAGAGGCTGAGAC --barcode-filter 0.01
@PG ID:tmap VN:3.4.0    CL:mapall -n 12 -f 
/results/referenceLibrary/tmap-f3/hg19/hg19.fasta -r 
basecaller_results/rawlib.basecaller.bam -v -Y -u -o 0 stage1 map4
karl@probook-debian:~/Lab/NGS_Unit/Runs/2013-01-10 RNA Seq/one$ 

and 

karl@probook-debian:~/Lab/NGS_Unit/Runs/2013-01-10 RNA Seq/one$ 
../two/bamUtil_1.0.6/bamUtil/bin/bam validate --in 
R_2013_01_09_10_33_23_user_SN1-34_Auto_user_SN1-34_33.bam

Number of records read = 231220
Number of valid records = 231220

TotalReads  231220.00
MappedReads 215118.00
PairedReads 0.00
ProperPair  0.00
DuplicateReads  0.00
QCFailureReads  0.00

MappingRate(%)  93.04
PairedReads(%)  0.00
ProperPair(%)   0.00
DupRate(%)  0.00
QCFailRate(%)   0.00

TotalBases  25413949.00
BasesInMappedReads  24654962.00
Returning: 0 (SUCCESS)
karl@probook-debian:~/Lab/NGS_Unit/Runs/2013-01-10 RNA Seq/one$ 

Original issue reported on code.google.com by karl.kas...@gmail.com on 10 Jan 2013 at 8:55

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
karl@probook-debian:~/Lab/NGS_Unit/Software/rnaseq/rseqc/RSeQC/bin$ 
./bam_stat.py --version
bam_stat.py 2.3.4
karl@probook-debian:~/Lab/NGS_Unit/Software/rnaseq/rseqc/RSeQC/bin$ 
./bam_stat.py -i /home/karl/Lab/NGS_Unit/Runs/2013-01-10\ RNA\ 
Seq/one/R_2013_01_09_10_33_23_user_SN1-34_Auto_user_SN1-34_33.bam
BAM/SAM file has no header section. Exit!
karl@probook-debian:~/Lab/NGS_Unit/Software/rnaseq/rseqc/RSeQC/bin$

Original comment by karl.kas...@gmail.com on 10 Jan 2013 at 9:25

GoogleCodeExporter commented 9 years ago
seems to be a problem with the bam header, this i get when i run it on the same 
bam file after conversion to sam:

karl@probook-debian:~/Lab/NGS_Unit/Software/rnaseq/rseqc/RSeQC/bin$ 
./bam_stat.py -i ~/Lab/NGS_Unit/Runs/2013-01-10\ RNA\ Seq/two/R_two.sam
Traceback (most recent call last):
  File "./bam_stat.py", line 62, in <module>
    main()
  File "./bam_stat.py", line 57, in main
    obj = SAM.ParseBAM(options.input_file)
  File "/home/karl/Lab/NGS_Unit/Software/rnaseq/rseqc/RSeQC/lib/python2.7/site-packages/qcmodule/SAM.py", line 2319, in __init__
    if len(self.samfile.header) ==0:
  File "csamtools.pyx", line 1123, in csamtools.Samfile.header.__get__ (lib/pysam/csamtools.c:11562)
ValueError: unknown field code 'FO' in record 'RG'
karl@probook-debian:~/Lab/NGS_Unit/Software/rnaseq/rseqc/RSeQC/bin$

The problem is described here: 
http://code.google.com/p/pysam/issues/detail?id=108 but applying these change 
to csamtools.pyx before RSeQC compilation does not help.

Original comment by karl.kas...@gmail.com on 10 Jan 2013 at 10:08

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
OK, got it working:

get a clean install of python 2.7
install cython by: easy_install cython
download pysam-0.7 from here 
http://code.google.com/p/pysam/downloads/detail?name=pysam-0.7.tar.gz&can=2&q=
unpack pysam
Patch csamtools.pyx like described here: 
http://code.google.com/p/pysam/issues/detail?id=108
run python setup.py install in the pysam directory
This will install a version of pysam that can read ion torrent files.

Then install RSeQC 2.3.4. Then go to the python site-packages directory of 
RSeQC and delete all modules that are from the pysam install of RSeQC so RSeQC 
uses the psam version we installed into the python installation. (just delete 
the files that are also present in your python dist-packages directory)

voila now it works:

karl@probook-debian:~/Lab/NGS_Unit/Software/rnaseq/rseqc/RSeQC/bin$ 
./bam_stat.py -i /home/karl/Lab/NGS_Unit/Runs/2013-01-10\ RNA\ 
Seq/two/R_two.sam 
Load SAM file ...  Done

#==================================================
Total Records:                231220

QC failed:                    0
Optical/PCR duplicate:        0
Non Primary Hits              0
Unmapped reads:               16102
Multiple mapped reads:        0

Uniquely mapped:              215118
Read-1:                       0
Read-2:                       0
Reads map to '+':             107393
Reads map to '-':             107725
Non-splice reads:             215118
Splice reads:                 0
Reads mapped in proper pairs: 0
karl@probook-debian:~/Lab/NGS_Unit/Software/rnaseq/rseqc/RSeQC/bin$ 

easy, after 4 hrs of fiddling... :)

Original comment by karl.kas...@gmail.com on 10 Jan 2013 at 10:37