broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
967 stars 370 forks source link

MergeBamAlignment breaks on non-paired input #1145

Open jleinenbach opened 6 years ago

jleinenbach commented 6 years ago

Bug Report

Similar to #1115 (now fixed), MergeBamAlignment breaks on non-paired input. Input is a SAM file with both, paired and non-paired reads.

Affected tool(s)

MergeBamAlignment

Affected version(s)

Picard version 2.18.1 Snapshot

morgancolp commented 6 years ago

Hey, just wondering if anyone has any update on this. A colleague of mine had a similar issue using Picard 2.17.3 with the following error message and I'm wondering if it's the same issue and if it's been addressed yet

Exception in thread "main" picard.PicardException: Second read from pair not found in unmapped bam: lib1:100034, lib1:106131 at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:394) at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:181) at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:282) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:228) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104)

nh13 commented 6 years ago

@jleinenbach @morgancolp A good (small) test case (unmapped SAM and mapped SAM, or the command used to go from unmapped BAM to mapped BAM as input to MergeBamAlignment) would be immensely useful to the developers.

fangling0913 commented 4 years ago

I have met the sam issue using picard 2.23.3 with the following error message my command: java -Xmx8G -jar /data/software/ONCO/picard/picard.jar MergeBamAlignment R=/mnt/pipeline-reference-data/hg_fasta/hg19_fasta_1/hg19.fa UNMAPPED=${out}/${sample}.umi.ubam ALIGNED=${out}/${sample}.umi.bam O=${out}/${sample}.umi.merged.bam CREATE_INDEX=true MAX_GAPS=-1 ALIGNER_PROPER_PAIR_FLAGS=true SORT_ORDER=coordinate ATTRIBUTES_TO_RETAIN=XS

INFO 2020-07-29 21:15:39 SamAlignmentMerger Processing SAM file(s): [/data/ONCO/development/projects/fangling/20200608_cfDNA_duplex/09900004-100k.umi.bam] WARNING 2020-07-29 21:15:39 SamAlignmentMerger Exception merging bam alignment - attempting to sort aligned reads and try again: Inappropriate call if not paired read INFO 2020-07-29 21:15:39 SamAlignmentMerger Finished reading 50037 total records from alignment SAM/BAM. [Wed Jul 29 21:15:40 CST 2020] picard.sam.MergeBamAlignment done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=2058354688 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" java.lang.IllegalStateException: Inappropriate call if not paired read at htsjdk.samtools.SAMRecord.requireReadPaired(SAMRecord.java:892) at htsjdk.samtools.SAMRecord.getProperPairFlag(SAMRecord.java:900) at picard.sam.AbstractAlignmentMerger.setValuesFromAlignment(AbstractAlignmentMerger.java:905) at picard.sam.AbstractAlignmentMerger.transferAlignmentInfoToFragment(AbstractAlignmentMerger.java:680) at picard.sam.AbstractAlignmentMerger.transferAlignmentInfoToPairedRead(AbstractAlignmentMerger.java:752) at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:481) at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186) at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:366) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

yfarjoun commented 4 years ago

see @nh13 's comment from 2018.

fangling0913 commented 4 years ago

@nh13 @yfarjoun The unmapped bam and mapped bam: 09900004-100k.umi.bam.zip 09900004-100k.umi.ubam.zip

my command lines:

!/bin/bash

fq_in=/data/ONCO/development/projects/fangling/20200608_cfDNA_duplex out=/data/ONCO/development/projects/fangling/20200608_cfDNA_duplex

BWA=/mnt/pipeline-programs/bwa/bwa-0.7.16/bwa samtools=/data/software/ONCO/samtools-1.10/samtools

sample=09900004-100k fq1=${fq_in}/CA20051502-100k-09900004D12P20_combined_R1.fastq.gz fq2=${fq_in}/CA20051502-100k-09900004D12P20_combined_R2.fastq.gz

java -Xmx8G -jar /data/software/ONCO/picard/picard.jar FastqToSam \ FASTQ=$fq1 \ FASTQ2=$fq2 \ OUTPUT=${out}/${sample}.ubam \ SAMPLE_NAME=${sample}

java -jar /data/software/ONCO/fgbio-1.1.0.jar ExtractUmisFromBam \ --input=${out}/${sample}.ubam \ --output=${out}/${sample}.umi.ubam \ --read-structure=5M70T 5M70T \ --single-tag=RX \ --molecular-index-tags=ZA ZB

$samtools fastq ${out}/${sample}.umi.ubam | \ $BWA mem -t 24 -M -R "@RG\tID:${sample}\tSM:${sample}\tPL:Illumina" /mnt/pipeline-reference-data/hg_fasta/hg19_fasta_1/hg19.fa /dev/stdin | \ $samtools view -b -@ 24 > ${out}/${sample}.umi.bam

java -Xmx8G -jar /data/software/ONCO/picard/picard.jar MergeBamAlignment \ R=/mnt/pipeline-reference-data/hg_fasta/hg19_fasta_1/hg19.fa \ UNMAPPED=${out}/${sample}.umi.ubam \ ALIGNED=${out}/${sample}.umi.bam \ O=${out}/${sample}.umi.merged.bam \ CREATE_INDEX=true \ MAX_GAPS=-1 \ ALIGNER_PROPER_PAIR_FLAGS=true \ SORT_ORDER=coordinate \ ATTRIBUTES_TO_RETAIN=XS

fangling0913 commented 4 years ago

@nh13 @yfarjoun Have you found any problems with my data or my command lines?

jp3117 commented 4 years ago

I had the same issue to use "picard.sam.MergeBamAlignment" (version 2.23.3):

INFO 2020-08-21 16:39:56 SamAlignmentMerger Read 32000000 records from alignment SAM/BAM. INFO 2020-08-21 16:39:59 SamAlignmentMerger Finished reading 32212150 total records from alignment SAM/BAM. [Fri Aug 21 16:40:00 EDT 2020] picard.sam.MergeBamAlignment done. Elapsed time: 4.82 minutes. Runtime.totalMemory()=12014059520 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" picard.PicardException: Second read from pair not found in unmapped bam: NB552042:81:HMNGLAFXY:1:11101:7856:1047, NB552042:81:HMNGLAFXY:4:21611:5731:118 75 at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:432) at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186) at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:366) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:301) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

The command to be used (most of them were set by default):

java -jar picard/build/libs/picard.jar MergeBamAlignment UNMAPPED_BAM=sorted.unmapped.umi.bam ALIGNED_BAM=[aligned.sam] OUTPUT=Merge_Alignments.bam MAX_INSERTIONS_OR_DELETIONS=-1 ALIGNER_PROPER_PAIR_FLAGS=true SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true REFERENCE_SEQUENCE=UCSC_hg19/genome.fa ADD_PG_TAG_TO_READS=true PAIRED_RUN=true CLIP_ADAPTERS=true IS_BISULFITE_SEQ UENCE=false ALIGNED_READS_ONLY=false ATTRIBUTES_TO_REVERSE=[OQ, U2] ATTRIBUTES_TO_REVERSE_COMPLEMENT=[E2, SQ] READ1_TRIM=0 READ2_TRIM=0 PRIMARY_ALIGNMENT_STRATEGY=BestMapq CLIP_OVERLAPPING_READS=true HARD_CLIP_OVERLAPPING_READS=false INCLUDE_SECONDARY_ALIGNMENTS=true ADD_MATE_CIGAR=true UNMAP_CONTAMINANT_READS=false MIN_UNCLIPPED_BASES=32 MATCHING_DICTIONA RY_TAGS=[M5, LN] UNMAPPED_READ_STRATEGY=DO_NOT_CHANGE VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false

bamorim-bio commented 2 years ago

Hello, I am having the same issue:

  java -jar /Users/beatrizamorim/Desktop/mtDNA/picard.jar MergeBamAlignment \
    ALIGNED_BAM=/Volumes/WD_beatriz/HUMANEVOL/DEMI/mtDNA/aln/pr.mapped.clean.s.ZIM56.aln2.bam \
    UNMAPPED_BAM=/Volumes/WD_beatriz/HUMANEVOL/DEMI/mtDNA/aln/rev.pr.mapped.clean.s.ZIM56.aln2.bam \
    OUTPUT=/Volumes/WD_beatriz/HUMANEVOL/DEMI/mtDNA/aln/merged.ZIM56.aln2.bam \
    REFERENCE_SEQUENCE=/Users/beatrizamorim/Desktop/mtDNA/mtDNA_crs.fa \
    PAIRED_RUN=True \
    CREATE_INDEX=true \
    SORT_ORDER=coordinate \
    VALIDATION_STRINGENCY=SILENT

[Tue Dec 07 17:26:45 WET 2021] Executing as beatrizamorim@BeatrizsMBP2021.Home on Mac OS X 10.16 x86_64; OpenJDK 64-Bit Server VM 11.0.9.1+1-LTS; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.26.4
INFO    2021-12-07 17:26:45 SamAlignmentMerger  Processing SAM file(s): [/Volumes/WD_beatriz/HUMANEVOL/DEMI/mtDNA/aln/pr.mapped.clean.s.ZIM56.aln2.bam]
WARNING 2021-12-07 17:26:46 SamAlignmentMerger  Exception merging bam alignment - attempting to sort aligned reads and try again: Underlying iterator is not queryname sorted: A00159:872:H5YF7DSX2:4:2548:14082:12164 1/2 150b aligned to lcl|NC_012920.1_cds_YP_003024026.1_1:1-150. > A00159:872:H5YF7DSX2:4:1364:18747:18740 1/2 150b aligned to lcl|NC_012920.1_cds_YP_003024026.1_1:1-150.
INFO    2021-12-07 17:26:50 SamAlignmentMerger  Read 1000000 records from alignment SAM/BAM.
INFO    2021-12-07 17:26:52 SamAlignmentMerger  Finished reading 1235710 total records from alignment SAM/BAM.
[Tue Dec 07 17:26:53 WET 2021] picard.sam.MergeBamAlignment done. Elapsed time: 0.14 minutes.
Runtime.totalMemory()=35651584
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Second read from pair not found in unmapped bam: A00159:872:H5YF7DSX2:4:2548:14082:12164, A00159:872:H5YF7DSX2:4:1364:18747:18740
    at picard.sam.AbstractAlignmentMerger.mergeAlignment(AbstractAlignmentMerger.java:432)
    at picard.sam.SamAlignmentMerger.mergeAlignment(SamAlignmentMerger.java:186)
    at picard.sam.MergeBamAlignment.doWork(MergeBamAlignment.java:368)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

I have used several versions of picard (1.119/2.26.6/2.26.4).

My pipeline was the following:
--aligned to BWA
--used CleanSam
--extracted only mapped reads with Samtools
--unaligned bam file with RevertSamSpark

Do you have any update on how to solve this error?