Open ashwini06 opened 4 years ago
Does the mutect2 directory exist in your current dir?
Mutect2 is in conda environment and my working directory is different from that path.
@ashwini06 Could you post the entire command line you are using, some of it appears to have been cut off.
@ashwini06 Following up on this to see if you are still experiencing problems. If so, could you post the entire command line?
@fleharty : Thanks for the followup. Sorry I missed your previous reply. Yes, the problem with mutect2 still exists.
gatk4 exists in my conda environment path
$conda list | grep 'gatk'
gatk4 4.1.8.0 py38h37ae868_0 bioconda
Here is my full command-line
gatk Mutect2 --reference /home/proj/stage/cancer/reference/GRCh37/genome/human_g1k_v37_decoy.fasta --input consensus/concatenated_ACC5611A1_XXXXXX_consensusalign_ds.bam --output mutect2/concatenated_ACC5611A1_XXXXXX_mutect2_unfiltered_ds.vcf.gz
@avalind This appears to be a different error from the one you were previously encountering. The current error indicates that there is something wrong with your bam. It appears that there is a mismatch to the size of your insert quality sizes and read size.
Is there a way that you can share your bam?
Also, are you sure that you intend to have insertion and deletion qualities, this is something we haven't been using for a few years now.
@fleharty : You can download the bam file using the shared link.
https://ki.box.com/s/b9fe0854eccclz85vvkktd2qfqquyq71
Also, are you sure that you intend to have insertion and deletion qualities, this is something we haven't been using for a few years now. In my workflow, these bam files were generated using sentieon bwa-mem with the default options. Are there any suggestions on how to run mutect2 successfully on this bam file?
@ashwini06
This bam appears to be malformed and it fails Picard ValidateSamFile. I think you'll need to examine the earlier stages of your pipeline that produce your bam to ensure you get a correctly formed bam. I'm going to close this ticket now since this doesn't appear to be an issue with Mutect2.
(base) wm462-624:Downloads fleharty$ java -jar $PICARD ValidateSamFile I=concatenated_ACC5611A1_XXXXXX_consensusalign_ds.bam INFO 2020-07-14 11:25:52 ValidateSamFile
** NOTE: Picard's command line syntax is changing.
** For more information, please see: ** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
** The command line looks like this in the new syntax:
** ValidateSamFile -I concatenated_ACC5611A1_XXXXXX_consensusalign_ds.bam
11:25:52.673 INFO NativeLibraryLoader - Loading libgkl_compression.dylib from jar:file:/Users/fleharty/resources/picard.jar!/com/intel/gkl/native/libgkl_compression.dylib [Tue Jul 14 11:25:52 EDT 2020] ValidateSamFile INPUT=concatenated_ACC5611A1_XXXXXX_consensusalign_ds.bam MODE=VERBOSE MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false [Tue Jul 14 11:25:52 EDT 2020] Executing as fleharty@wm462-624 on Mac OS X 10.15.5 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.20.4-SNAPSHOT WARNING 2020-07-14 11:25:52 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur. ERROR: Record 18321, Read name UMI-ATT-GAA-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 26312, Read name UMI-CCT-TTC-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 70755, Read name UMI-CAG-GGA-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 145082, Read name UMI-AAC-ATG-5, Zero-length read without FZ, CS or CQ tag ERROR: Record 181500, Read name UMI-ACT-CTT-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 186837, Read name UMI-CAA-CTC-4, Zero-length read without FZ, CS or CQ tag ERROR: Record 186862, Read name UMI-CGC-GCC-0, Zero-length read without FZ, CS or CQ tag ERROR: Record 186904, Read name UMI-AGG-GTC-0, Zero-length read without FZ, CS or CQ tag ERROR: Record 186919, Read name UMI-CGC-TGC-0, Zero-length read without FZ, CS or CQ tag ERROR: Record 186947, Read name UMI-TAA-TAG-3, Zero-length read without FZ, CS or CQ tag ERROR: Record 186970, Read name UMI-GAG-GCC-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 186972, Read name UMI-TAT-TTC-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 186985, Read name UMI-ACG-TAA-6, Zero-length read without FZ, CS or CQ tag ERROR: Record 186995, Read name UMI-CTT-GCA-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 187006, Read name UMI-CTA-GGG-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 187037, Read name UMI-AGT-CTG-5, Zero-length read without FZ, CS or CQ tag ERROR: Record 187061, Read name UMI-CAT-GGT-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 187074, Read name UMI-AAA-CGT-4, Zero-length read without FZ, CS or CQ tag ERROR: Record 187110, Read name UMI-ACG-TAG-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 187121, Read name UMI-CCG-GCC-3, Zero-length read without FZ, CS or CQ tag ERROR: Record 187154, Read name UMI-CAA-CTG-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 187181, Read name UMI-CGG-GAG-3, Zero-length read without FZ, CS or CQ tag ERROR: Record 187209, Read name UMI-CAA-GTT-5, Zero-length read without FZ, CS or CQ tag ERROR: Record 279812, Read name UMI-ACT-GGT-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 327672, Read name UMI-AGT-CGG-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 367457, Read name UMI-GGA-TTA-6, Zero-length read without FZ, CS or CQ tag ERROR: Record 441607, Read name UMI-AGA-GTC-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 481504, Read name UMI-AAC-TCT-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 481532, Read name UMI-AAT-CAA-8, Zero-length read without FZ, CS or CQ tag ERROR: Record 481722, Read name UMI-ATA-ATT-10, Zero-length read without FZ, CS or CQ tag ERROR: Record 481989, Read name UMI-CGA-CTA-8, Zero-length read without FZ, CS or CQ tag ERROR: Record 482114, Read name UMI-GAG-TAA-8, Zero-length read without FZ, CS or CQ tag ERROR: Record 482150, Read name UMI-GCC-GTA-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 482210, Read name UMI-GGT-TCC-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 482222, Read name UMI-GTA-GTT-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 482251, Read name UMI-GTT-TAC-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 541693, Read name UMI-AGG-GAG-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 763643, Read name UMI-GAG-TAT-5, Zero-length read without FZ, CS or CQ tag ERROR: Record 763881, Read name UMI-AGC-TTT-3, Zero-length read without FZ, CS or CQ tag ERROR: Record 764724, Read name UMI-AAT-ATA-14, Zero-length read without FZ, CS or CQ tag ERROR: Record 764749, Read name UMI-GCT-GTG-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 764766, Read name UMI-AGC-TAG-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 764858, Read name UMI-AGA-GGT-4, Zero-length read without FZ, CS or CQ tag ERROR: Record 764950, Read name UMI-CTT-GCC-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 765124, Read name UMI-CGG-TGT-3, Zero-length read without FZ, CS or CQ tag ERROR: Record 765139, Read name UMI-GGA-GTC-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 765157, Read name UMI-ATA-CTC-3, Zero-length read without FZ, CS or CQ tag ERROR: Record 765213, Read name UMI-AGC-TCT-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 765249, Read name UMI-AAG-GAT-3, Zero-length read without FZ, CS or CQ tag ERROR: Record 765281, Read name UMI-AAG-ACT-8, Zero-length read without FZ, CS or CQ tag ERROR: Record 765385, Read name UMI-CGA-CGT-3, Zero-length read without FZ, CS or CQ tag ERROR: Record 765535, Read name UMI-GGG-TTG-10, Zero-length read without FZ, CS or CQ tag ERROR: Record 765582, Read name UMI-ATG-TAA-6, Zero-length read without FZ, CS or CQ tag ERROR: Record 765607, Read name UMI-CCG-CTA-3, Zero-length read without FZ, CS or CQ tag ERROR: Record 765620, Read name UMI-AAA-ATT-16, Zero-length read without FZ, CS or CQ tag ERROR: Record 765717, Read name UMI-AGG-TAT-8, Zero-length read without FZ, CS or CQ tag ERROR: Record 766523, Read name UMI-GAA-GGA-5, Zero-length read without FZ, CS or CQ tag ERROR: Record 822437, Read name UMI-AGA-CCT-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 936121, Read name UMI-CGA-TTT-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 964359, Read name UMI-ACT-TAA-16, Zero-length read without FZ, CS or CQ tag ERROR: Record 965939, Read name UMI-GCA-GTT-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 965956, Read name UMI-AAA-ATA-37, Zero-length read without FZ, CS or CQ tag ERROR: Record 966315, Read name UMI-CTC-GAG-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 966349, Read name UMI-ACT-GTT-8, Zero-length read without FZ, CS or CQ tag ERROR: Record 966385, Read name UMI-ATT-GCA-10, Zero-length read without FZ, CS or CQ tag ERROR: Record 966397, Read name UMI-ACC-CGG-4, Zero-length read without FZ, CS or CQ tag ERROR: Record 966402, Read name UMI-CAG-TGT-5, Zero-length read without FZ, CS or CQ tag ERROR: Record 966417, Read name UMI-CCG-CCT-5, Zero-length read without FZ, CS or CQ tag ERROR: Record 966450, Read name UMI-CCC-GAT-4, Zero-length read without FZ, CS or CQ tag ERROR: Record 966462, Read name UMI-CCG-TCT-3, Zero-length read without FZ, CS or CQ tag ERROR: Record 966487, Read name UMI-GAT-GTT-4, Zero-length read without FZ, CS or CQ tag ERROR: Record 966491, Read name UMI-GTG-TTG-3, Zero-length read without FZ, CS or CQ tag ERROR: Record 966501, Read name UMI-AGA-ATG-4, Zero-length read without FZ, CS or CQ tag ERROR: Record 966509, Read name UMI-AGT-GGT-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 966514, Read name UMI-ATC-GGA-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 966517, Read name UMI-GAT-TGA-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 966538, Read name UMI-ATA-GGG-23, Zero-length read without FZ, CS or CQ tag ERROR: Record 966542, Read name UMI-GTG-TAG-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 966591, Read name UMI-CCG-TAT-6, Zero-length read without FZ, CS or CQ tag ERROR: Record 966596, Read name UMI-GTT-GTT-3-D2, Zero-length read without FZ, CS or CQ tag ERROR: Record 966613, Read name UMI-ACC-GAC-1, Zero-length read without FZ, CS or CQ tag ERROR: Record 966616, Read name UMI-ACG-TGG-5, Zero-length read without FZ, CS or CQ tag ERROR: Record 966618, Read name UMI-ACT-GGG-11, Zero-length read without FZ, CS or CQ tag ERROR: Record 966620, Read name UMI-ACT-GGG-12, Zero-length read without FZ, CS or CQ tag ERROR: Record 966627, Read name UMI-GGC-TGT-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 966674, Read name UMI-CCT-GTC-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 966699, Read name UMI-CCG-TGA-4, Zero-length read without FZ, CS or CQ tag ERROR: Record 966722, Read name UMI-AGG-TGT-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 966742, Read name UMI-CCG-TCA-5, Zero-length read without FZ, CS or CQ tag ERROR: Record 966752, Read name UMI-GAA-GAT-5, Zero-length read without FZ, CS or CQ tag ERROR: Record 966784, Read name UMI-CCT-TAT-12, Zero-length read without FZ, CS or CQ tag ERROR: Record 966875, Read name UMI-AGG-GGG-10, Zero-length read without FZ, CS or CQ tag ERROR: Record 966887, Read name UMI-AGG-CCG-5, Zero-length read without FZ, CS or CQ tag ERROR: Record 966916, Read name UMI-GCT-TCG-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 966939, Read name UMI-CAA-TGT-8, Zero-length read without FZ, CS or CQ tag ERROR: Record 966989, Read name UMI-GAA-TCA-7, Zero-length read without FZ, CS or CQ tag ERROR: Record 966991, Read name UMI-TAG-TGT-2, Zero-length read without FZ, CS or CQ tag ERROR: Record 967245, Read name UMI-AAG-ATT-8, Zero-length read without FZ, CS or CQ tag ERROR: Record 975151, Read name UMI-ACT-CCC-4, Zero-length read without FZ, CS or CQ tag ERROR: Record 1064783, Read name UMI-GGA-GGT-6, Zero-length read without FZ, CS or CQ tag Maximum output of [100] errors reached. [Tue Jul 14 11:25:59 EDT 2020] picard.sam.ValidateSamFile done. Elapsed time: 0.12 minutes. Runtime.totalMemory()=1450180608 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
@avalind I got an e-mail saying that you ran picard and had no errors, but I don't see that comment here.
@fleharty I think you meant to tag @ashwini06 (the creator of this issue). I also received that email, maybe @ashwini06 deleted the comment shortly after posting it?
@fleharty @avalind Sorry, something happened with my previous message. But what I wrote previously was that I couldn't reproduce the same error message using Picard ValidateSamFile.
I tried validating my bam file and I don't see any errors. Even the samtools flagstat option works fine on my bam file. Please find the attached screenshots,
Do you still think my bam file is malformatted?
PS: @fleharty used Picard version (2.20.4-SNAPSHOT), whereas I used v.2.23.2; for running Picard ValidateSamFile.
Bumping this since I ran into the same error as I was helping QC a colleagues data, running GATK 4.1.8.1 produces the following:
https://www.dropbox.com/s/2uleabl53dmg9y3/Screenshot%202020-07-28%2000.35.45.png
And this is on targeted capture data (Twist custom capture) ran through our core facility's sentieon pipeline, using the 'consensus' reads mapped to 1kg_grch37, using the raw reads works fine. Im not very familiar with sentieons pipelines but the steps to generate the UMI consensus reads are described at https://support.sentieon.com/appnotes/umi/.
At first I though that discrepancy between @fleharty's ValidateSam and yours @ashwini06, could be that in the the newer version of Picard uses an updated version of htsjdk (v 2.23.0), but it's the same version of htsjdk that's included in GATK 4.1.8.1, so it seems unlikely. Walking through the commits between Picard 2.22.8 (the one bundled with GATK 4.1.8.1) and 2.23.2 doesn't (at least at first glance for me) show any commits changing code that could explain the differences in behaviour.
After more digging around it seems that in the case of partial alignment (i.e. hard clipped bases) the BD and BI tags that sentieon just copies from the consensus fastq aren't trimmed to the actual length of the aligned sequence, and thus are to long and it's this that causes problems.
As these are non-standard tags the SAM/BAM format specification doesn't say anything on whether their length must equal the aligned segment of bases, but it clearly doesn't make any sense to have quality data on bases that are not part of the alignment (= hard clipped), so IMHO the solution here would be for Sentieon to fix their tool.
I've written a small utility that trims the BD and BI tags (based on the CIGAR-string) to have the same length as the actual aligned segment of the read, https://github.com/avalind/doppelganger.
I have problems running gatk Mutect2.
gatk version
command-line
gatk Mutect2 -R /home/proj/stage/cancer/reference/GRCh37/genome/human_g1k_v37_decoy.fasta -L /home/proj/stage/cancer/reference/target_capture_bed/production/balsamic/gicfdna_3.1_hg1
Error