broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
967 stars 370 forks source link

Picard 2.26.2 ExtractIlluminaBarcodes fails with "No barcodes have been specified." #1728

Open myourshaw opened 2 years ago

myourshaw commented 2 years ago

Bug Report

Affected tool(s)

ExtractIlluminaBarcodes

Affected version(s)

Description

Version 2.26.2 of the docker image fails almost instantly with this error (twice): No barcodes have been specified. No barcodes have been specified.

The version 2.25.7 docker image runs successfully on the same platforms with exactly the same inputs.

Steps to reproduce

This has failed in a DNAnexus workflow and also running the app on a desktop. java "-Xmx60g" -jar /usr/picard/picard.jar \ ExtractIlluminaBarcodes \ --BASECALLS_DIR 210825_NS500652_0453_AHK5LTBGXH/Data/Intensities/BaseCalls \ --BARCODE_FILE /home/dnanexus/inputs/input7977559225486846035/stdout \ --READ_STRUCTURE 146T8B9M8B146T \ --LANE 4 \ --METRICS_FILE 210825_NS500652_0453_AHK5LTBGXH.4.IlluminaBarcodesMetrics.txt \ --OUTPUT_DIR barcodes \ --DISTANCE_MODE "HAMMING" \ --MAX_MISMATCHES 0 \ --MAX_NO_CALLS 2 \ --MIN_MISMATCH_DELTA 1 \ --MINIMUM_BASE_QUALITY 0 \ --NUM_PROCESSORS 0 \ ;

Expected behavior

App should run create barcodes and metrics files

Actual behavior

App emits the help info and the error message, suggesting it did not attempt to run.

This is the contents of the input BARCODE_FILE barcode_sequence_1 barcode_sequence_2 barcode_name library_name CCTTGATCNNNNNNNNN GATGGAGT UMI_IDT-25.i5_IDT-72 08232021JH_ST TGAAGACGNNNNNNNNN ACATGCCA UMI_IDT-44.i5_IDT-53 08232021JH_ST GTTACGCANNNNNNNNN ATGGCGAT UMI_IDT-45.i5_IDT-52 08232021JH_ST AGCGTGTTNNNNNNNNN CTTCGCAA UMI_IDT-46.i5_IDT-51 08232021JH_ST ACAGCTCANNNNNNNNN TACTGCTC UMI_IDT-48.i5_IDT-49 08232021JH_ST CATGGCTANNNNNNNNN CACACATC UMI_IDT-34.i5_IDT-63 08232021JH_ST ATGCCTGTNNNNNNNNN AGATTGCG UMI_IDT-35.i5_IDT-62 08232021JH_ST CAACACCTNNNNNNNNN AGCTACCA UMI_IDT-36.i5_IDT-61 08232021JH_ST TGTGACTGNNNNNNNNN AGCCTATC UMI_IDT-37.i5_IDT-60 08232021JH_ST GTCATCGANNNNNNNNN GATCCACT UMI_IDT-38.i5_IDT-59 08232021JH_ST AGCACTTCNNNNNNNNN ACGTCCAA UMI_IDT-39.i5_IDT-58 08232021JH_ST CTGATCGTNNNNNNNNN GCGCATAT UMI_IDT-1.i5_IDT-96 08232021JH_ST GTTGTTCGNNNNNNNNN CCAACTTC UMI_IDT-41.i5_IDT-56 08232021JH_ST CGGTTGTTNNNNNNNNN GTGGTATG UMI_IDT-42.i5_IDT-55 08232021JH_ST ACTGAGGTNNNNNNNNN GTCAACAG UMI_IDT-43.i5_IDT-54 08232021JH_ST GTCGAAGANNNNNNNNN CCACATTG UMI_IDT-26.i5_IDT-71 08232021JH_ST ACCACGATNNNNNNNNN GTCTGCAA UMI_IDT-27.i5_IDT-70 08232021JH_ST GATTACCGNNNNNNNNN TTGGACTG UMI_IDT-28.i5_IDT-69 08232021JH_ST GCACAACTNNNNNNNNN CTGAACGT UMI_IDT-29.i5_IDT-68 08232021JH_ST GCGTCATTNNNNNNNNN CAGACGTT UMI_IDT-30.i5_IDT-67 08232021JH_ST ATCCGGTANNNNNNNNN GACCGATA UMI_IDT-31.i5_IDT-66 08232021JH_ST CGTTGCAANNNNNNNNN ATAGAGCG UMI_IDT-32.i5_IDT-65 08232021JH_ST GTGAAGTGNNNNNNNNN GAGCAATC UMI_IDT-33.i5_IDT-64 08232021JH_ST

gbggrant commented 2 years ago

Hi @myourshaw we believe that the issue here is with barcode_sequence_1 - the code does not expect to find any Ns in the sequence, rather the sequence should ONLY contain the 8 bases of the barcode (CCTTGATC in the first line). Unfortunately it is not giving a very helpful error message - we will work on improving the error logging here.

If you were to remove the 9 Ns in the barcode_sequence_1 entries in your file, then ExtractIlluminaBarcodes should operate properly.

myourshaw commented 2 years ago

This problem continues to exist as of Picard version 2.26.6. The last version that has worked properly for me (I haven't tried all the intermediates) is 2.25.7. It is important to get these fixed as IDT UMI adapters barcodes are specified like

barcode_sequence_1  barcode_sequence_2  barcode_name    library_name
AACTGAGCNNNNNNNNN   CAATCAGG    i7_IDT-18.i5_IDT-79 08182020JH_ST
CTTAGGACNNNNNNNNN   ACTCCTAC    i7_IDT-19.i5_IDT-78 08182020JH_ST
GTGCCATANNNNNNNNN   CTCCTAGT    i7_IDT-20.i5_IDT-77 08182020JH_ST