brentp / bwa-mips

Map sequence from Molecular Inversion Probes with BWA, strip arms, de-dup, ..., profit
MIT License
10 stars 4 forks source link

program crashes when trying to use picard #2

Open ninanorgren opened 9 years ago

ninanorgren commented 9 years ago

When I try to run the example files the program crashes when it tries to use picard, it might be something with the input file it cannot find. It only produces an empty sample.bam output file before it crashes.

Here is the complete error message:

++ R1=sample-202-20_S30_L001_R1_001.fastq.gz
++ R2=sample-202-20_S30_L001_R2_001.fastq.gz
++ pic='/home/picard-tools-1.129*'
++ mkdir -p results
++ rm -f results/sample.bam
++ python ../bwamips.py ref/chr6.fa mips-design.txt sample-202-20_S30_L001_R1_001.fastq.gz sample-202-20_S30_L001_R2_001.fastq.gz --threads 16 --picard-dir /home/picard-tools-1.129
bwa mem -p -C -M -t 16 -R '@RG\tID:sample-202-20_S30_L001\tSM:sample-202-20_S30_L001\tPL:illumina' -v 1 ref/chr6.fa '<python ../bwamips.py detag sample-202-20_S30_L001_R1_001.fastq.gz sample-202-20_S30_L001_R2_001.fastq.gz 5' | gzip -c - > /tmp/tmpXotfmU.sam.gz
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (151, 151, 151)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (151, 151)
[M::mem_pestat] mean and std.dev: (151.00, 0.00)
[M::mem_pestat] low and high boundaries for proper pairs: (151, 151)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[main] Version: 0.7.10-r789
[main] CMD: bwa mem -p -C -M -t 16 -R @RG\tID:sample-202-20_S30_L001\tSM:sample-202-20_S30_L001\tPL:illumina -v 1 ref/chr6.fa <python ../bwamips.py detag sample-202-20_S30_L001_R1_001.fastq.gz sample-202-20_S30_L001_R2_001.fastq.gz 5
[main] Real time: 1.568 sec; CPU: 5.443 sec
reading mips-design.txt
[Mon Mar 09 18:20:35 SGT 2015] net.sf.picard.sam.FixMateInformation INPUT=[/dev/stdin] OUTPUT=/dev/stdout SORT_ORDER=coordinate    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Mon Mar 09 18:20:35 SGT 2015] Executing as k@apollo1 on Linux; Picard version: 1.86(1363)

================
Alignment Report
================

Bases in genome: 171,115,067
Bases in target region: 85,128 (0.0497% of genome)

MIPs found: 1,413
Off-target reads: 847
Unmapped reads: 800

Observed / expected enrichment where expected is based on size
of target region relative to size of genome.

FOLD ENRICHMENT
===============
low - high: 1724.50 - 3353.32

Low estimate uses unmapped reads as well as off-target.

% READS ON TARGET
=================
46.18%

INFO    2015-03-09 18:20:35     FixMateInformation      Sorting input into queryname order.
[Mon Mar 09 18:20:35 SGT 2015] net.sf.picard.sam.FixMateInformation done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=504168448
FAQ:  http://sourceforge.net/apps/mediawiki/picard/index.php?title=Main_Page
Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. RG ID on SAMRecord not found in header: sample-202-20_S30_L001; File /dev/stdin; Line 1
Line: M00658:45:000000000-A3LA4:1:2101:17149:22610      77      *       0       0       **0       0       CAGACTCGGCGCGGATCGTGCGTGTTCATGATCATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGCTAATACCAACTTAGCCAGGCTGGTAGAGAAATAGTGACAACAGGCAATAAGACAGCGAGGGACTATAAAAGAACTGGGCAGA   BBBBBFFBBBBBGGF2EFGGCEGGGHDDGDGHHFHCBGH1EHHBF?EGAFDHBBHHE1BGDGEEE?//03344B43B3/33333330??00/22B2?B22222?2>?22?/2//<F1?11?100/----<<../000000../00:../;;     AS:i:0  XS:i:0  RG:Z:sample-202-20_S30_L001       BC:Z:GATGG
        at net.sf.samtools.SAMLineParser.reportErrorParsingLine(SAMLineParser.java:427)
        at net.sf.samtools.SAMLineParser.parseLine(SAMLineParser.java:331)
        at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:237)
        at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:225)
        at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:201)
        at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
        at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
        at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:148)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
        at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:76)
Traceback (most recent call last):
  File "../bwamips.py", line 566, in <module>
    main()
  File "../bwamips.py", line 560, in main
    args.umi_length, args.picard_dir)
  File "../bwamips.py", line 437, in bwamips
    dedup_sam(dearm_sam(sam_gz, mips), get_umi, sam_out, mips)
  File "../bwamips.py", line 494, in dedup_sam
    out.write(str(aln) + "\n")
IOError: [Errno 32] Broken pipe
brentp commented 9 years ago

This works for me with picard-tools-1.119 and the update that I just pushed should make it work with your version. Let me know.

Flope commented 8 years ago

Hi, I have the same error. I tried with 2 different versions of picard tools 1.9 and 1.119.

Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. RG ID on SAMRecord not found in header: 769P_AH25NKADXY_L001; File /dev/stdin; Line 1
Line: HISEQ:318:H25NKADXY:1:1102:19045:40460    77  *   0   0   *   *   0   0   TGGCAAAAGGCTGTTTCTTTTAAACACCCTTTTTACACTACCGTCGGATATCGGGAAGCTGAAGTGGCAAAAGGCTGTTTCTTTTAAACACCCTTTTTACACTACCGTCGGATCGTGCGTGTCGAT  CCCFFFFFHDHHHJIIJJJJJIJJJJIIJJJIJJGIJ

Any suggestions on what could be causing it?

Thanks

brentp commented 8 years ago

you see this error on the example data?

brentp commented 8 years ago

I just commit a change that might address this. If it does not, let me know and I'll make it output the de-armed SAM and the user can run fixmateinformation on their own.

Flope commented 8 years ago

Thank you very much for your quick response.

No. I did not see the error with the sample data. I was trying different options and I am able to run my sequences and parameters with your example script. Except, when I change the ref chromosome (chr6 as in the example) for one that I have mips designed for (chr7).

Taking a look to the first error, It may not be the same one as mine. This was my original error:

INFO    2016-01-22 16:30:14    FixMateInformation    Sorting input into queryname order.
[Fri Jan 22 16:30:19 EST 2016] picard.sam.FixMateInformation done. Elapsed time: 6.87 minutes.
Runtime.totalMemory()=757661696
To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing text SAM file. Tag of type i should have signed decimal value; File /dev/stdin; Line 247729
Line: HISEQ:318:H25NKADXY:1:2210:5526:37163    675    chr1    11167510    60    107M13S    =    11167604    220    AAAAAACGTGATGGGCACATCTGGGCCTCCAGTTACCAGAAAGGGCACCTAAGAAGGCAGAAGGAAAAGGAATATTTTAATATTTTGAGCTCCTTCAAAGGTTTACAAGGAAACCTGGAA<)?77<AA5?A***5A>1?===<0=ABA############################################################################################    NM:i:3    MD:Z:62A19A8T15    AS:i:92    XS:i:20    RG:Z:769P_AH25NKADXY_L001    BC:Z:AGGGGG    OP:i:11167510    XI:i:None    XO:Z:AAAAAACGTGATGGGCACATCTGGGCCTCCAGTTACCAGAAAGGGCACCTAAGAAGGCAGAAGGAAAAGGAATATTTTAATATTTTGAGCTCCTTCAAAGGTTTACAAGGAAACCTGGAA    OC:Z:107M13S
    at htsjdk.samtools.SAMLineParser.reportErrorParsingLine(SAMLineParser.java:438)
    at htsjdk.samtools.SAMLineParser.parseTag(SAMLineParser.java:397)
    at htsjdk.samtools.SAMLineParser.parseLine(SAMLineParser.java:325)
    at htsjdk.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:247)
    at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:235)
    at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:211)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:514)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:488)
    at picard.sam.FixMateInformation.doWork(FixMateInformation.java:164)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
    at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:124)
    at picard.sam.FixMateInformation.main(FixMateInformation.java:92)
Traceback (most recent call last):
  File "/soft/bwa-mips/bwamips.py", line 596, in <module>
    main()
  File "/soft/bwa-mips/bwamips.py", line 590, in main
    args.umi_length, args.picard_dir)
  File "/soft/bwa-mips/bwamips.py", line 450, in bwamips
    dedup_sam(dearm_sam(sam_gz, mips), get_umi if umi_length > 0 else None, sam_out, mips)
  File "/bwa-mips/bwamips.py", line 523, in dedup_sam
    out.write(str(aln) + "\n")
IOError: [Errno 32] Broken pipe

It seems that error occurred on line 388 of bwamips.py, where sometimes mip.get can return "None" if no MIP match was found, and that is an invalid SAM value for an integer tag.

I also used the your newer version it seems that solve some of the issues because now I am able to run it with chr7 but I am getting the following error when running it with the whole genome:

net.sf.picard.sam.FixMateInformation done. Elapsed time: 6.76 minutes.
Runtime.totalMemory()=757661696
To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. RG ID on SAMRecord not found in header: 769P_AH25NKADXY_L001; File /dev/stdin; Line 1
Line: HISEQ:318:H25NKADXY:1:1101:9772:11080 141 *   0   0   *   *   0   0   ATTCACTTTCCCCTTCCCAGAATGGGGGCCTTTGGCCGGATGGTGACAAGTCGGGTGGTGGCGGTAGCGTTACAAAAAAAAAAAGAGATGTGGCACCAGTAAGGGCGTGTGAGGAGACGA    ########################################################################################################################    AS:i:0  XS:i:0  RG:Z:769P_AH25NKADXY_L001   BC:Z:AAAAAA
    at net.sf.samtools.SAMLineParser.reportErrorParsingLine(SAMLineParser.java:427)
    at net.sf.samtools.SAMLineParser.parseLine(SAMLineParser.java:331)
    at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:237)
    at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:225)
    at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:201)
    at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672)
    at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650)
    at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:148)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
    at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:76)
Traceback (most recent call last):
  File "/soft/bwa-mips/bwamips.py", line 595, in <module>
    main()
  File "/soft/bwa-mips/bwamips.py", line 589, in main
    args.umi_length, args.picard_dir)
  File "/soft/bwa-mips/bwamips.py", line 447, in bwamips
    out.stdin, mips)
  File "/soft/bwa-mips/bwamips.py", line 499, in dedup_sam
    out.write(str(r) + '\n')
IOError: [Errno 32] Broken pipe

Thanks

brentp commented 8 years ago

the first error shows that you have a mip in the fastq that wasn't in the design file. for the 2nd error, if you could send a small fastq to recreate, I'll have a look.

On Mon, Jan 25, 2016 at 6:35 PM, Flope notifications@github.com wrote:

Thank you very much for your quick response.

No. I did not see the error with the sample data. I was trying different options and I am able to run my sequences and parameters with your example script. Except, when I change the ref chromosome (chr6 as in the example) for one that I have mips designed for (chr7).

Taking a look to the first error, It may not be the same one as mine. This was my original error:

INFO 2016-01-22 16:30:14 FixMateInformation Sorting input into queryname order. [Fri Jan 22 16:30:19 EST 2016] picard.sam.FixMateInformation done. Elapsed time: 6.87 minutes. Runtime.totalMemory()=757661696 To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing text SAM file. Tag of type i should have signed decimal value; File /dev/stdin; Line 247729 Line: HISEQ:318:H25NKADXY:1:2210:5526:37163 675 chr1 11167510 60 107M13S = 11167604 220 AAAAAACGTGATGGGCACATCTGGGCCTCCAGTTACCAGAAAGGGCACCTAAGAAGGCAGAAGGAAAAGGAATATTTTAATATTTTGAGCTCCTTCAAAGGTTTACAAGGAAACCTGGAA<)?77<AA5?A***5A>1?===<0=ABA############################################################################################ NM:i:3 MD:Z:62A19A8T15 AS:i:92 XS:i:20 RG:Z:769P_AH25NKADXY_L001 BC:Z:AGGGGG OP:i:11167510 XI:i:None XO:Z:AAAAAACGTGATGGGCACATCTGGGCCTCCAGTTACCAGAAAGGGCACCTAAGAAGGCAGAAGGAAAAGGAATATTTTAATATTTTGAGCTCCTTCAAAGGTTTACAAGGAAACCTGGAA OC:Z:107M13S at htsjdk.samtools.SAMLineParser.reportErrorParsingLine(SAMLineParser.java:438) at htsjdk.samtools.SAMLineParser.parseTag(SAMLineParser.java:397) at htsjdk.samtools.SAMLineParser.parseLine(SAMLineParser.java:325) at htsjdk.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:247) at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:235) at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:211) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:514) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:488) at picard.sam.FixMateInformation.doWork(FixMateInformation.java:164) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183) at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:124) at picard.sam.FixMateInformation.main(FixMateInformation.java:92) Traceback (most recent call last): File "/soft/bwa-mips/bwamips.py", line 596, in main() File "/soft/bwa-mips/bwamips.py", line 590, in main args.umi_length, args.picard_dir) File "/soft/bwa-mips/bwamips.py", line 450, in bwamips dedup_sam(dearm_sam(sam_gz, mips), get_umi if umi_length > 0 else None, sam_out, mips) File "/bwa-mips/bwamips.py", line 523, in dedup_sam out.write(str(aln) + "\n") IOError: [Errno 32] Broken pipe

It seems that error occurred on line 388 of bwamips.py, where sometimes mip.get can return "None" if no MIP match was found, and that is an invalid SAM value for an integer tag.

I also used the your newer version it seems that solve some of the issues because now I am able to run it with chr7 but I am getting the following error when running it with the whole genome:

net.sf.picard.sam.FixMateInformation done. Elapsed time: 6.76 minutes. Runtime.totalMemory()=757661696 To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing text SAM file. RG ID on SAMRecord not found in header: 769P_AH25NKADXY_L001; File /dev/stdin; Line 1 Line: HISEQ:318:H25NKADXY:1:1101:9772:11080 141 * 0 0 * * 0 0 ATTCACTTTCCCCTTCCCAGAATGGGGGCCTTTGGCCGGATGGTGACAAGTCGGGTGGTGGCGGTAGCGTTACAAAAAAAAAAAGAGATGTGGCACCAGTAAGGGCGTGTGAGGAGACGA ######################################################################################################################## AS:i:0 XS:i:0 RG:Z:769P_AH25NKADXY_L001 BC:Z:AAAAAA at net.sf.samtools.SAMLineParser.reportErrorParsingLine(SAMLineParser.java:427) at net.sf.samtools.SAMLineParser.parseLine(SAMLineParser.java:331) at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:237) at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:225) at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:201) at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:672) at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:650) at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:148) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177) at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119) at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:76) Traceback (most recent call last): File "/ycga-ba/home/jfl27/soft/bwa-mips/bwamips.py", line 595, in main() File "/ycga-ba/home/jfl27/soft/bwa-mips/bwamips.py", line 589, in main args.umi_length, args.picard_dir) File "/ycga-ba/home/jfl27/soft/bwa-mips/bwamips.py", line 447, in bwamips out.stdin, mips) File "/ycga-ba/home/jfl27/soft/bwa-mips/bwamips.py", line 499, in dedup_sam out.write(str(r) + '\n') IOError: [Errno 32] Broken pipe

Thanks

— Reply to this email directly or view it on GitHub https://github.com/brentp/bwa-mips/issues/2#issuecomment-174760952.

Flope commented 8 years ago

Thanks again. I managed to make it work with picard-tools-1.119. It does not work with picard-tools-1.92 and, I think, I am getting the second error.

the first error shows that you have a mip in the fastq that wasn't in the design file.

I am not sure if I understand that. Does it mean that the software found an amplicon in the sequences that wasn't listed. But, isn't it expected to find unspecific amplification?

Thanks.