broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
975 stars 369 forks source link

[Bug] v2.25 IlluminaBasecallsToSam doesn't write any reads! #1666

Closed jamestwebber closed 3 years ago

jamestwebber commented 3 years ago

Bug Report

Affected tool(s)

IlluminaBasecallsToSam

Affected version(s)

Description

I was trying to upgrade to the latest version of Picard and was confused when nothing showed up in my uBAM files. Everything works fine in version 2.24.1 but when I tried the latest (and also 2.25.0 and 2.25.1) nothing gets written.

Steps to reproduce

Command used (run on the UGER cluster):

java -Djava.io.tmpdir=[path to tmp] -XX:+UseParallelGC -XX:GCTimeLimit=20 -XX:GCHeapFreeLimit=10 \
 -Xms30g -Xmx30g -jar picard.jar  IlluminaBasecallsToSam --TMP_DIR [path to tmp] \
--VALIDATION_STRINGENCY SILENT --BASECALLS_DIR [path to basecalls] --LANE 1 \
--RUN_BARCODE HWM72DMXX --READ_STRUCTURE 42T8B60T --LIBRARY_PARAMS [...] \
--INCLUDE_NON_PF_READS false --APPLY_EAMSS_FILTER false --ADAPTERS_TO_CHECK null \
--SEQUENCING_CENTER BI --BARCODES_DIR [...] --TILE_LIMIT 1 --IGNORE_UNEXPECTED_BARCODES true

Expected behavior

When run with 2.24.1 or 2.24.2 I get ~200k reads in each of 23 BAM files... [edit: confirmed with 2.24.2 as well]

Actual behavior

With 2.25.* I get nothin'

akt68 commented 3 years ago

I get the same result with IlluminaBasecallsToFastq when demuxing a NextSeq 1000 run. With version 2.24.2 sequences are written to fastq files as expected, but with 2.25.2 all fastq files are empty. With 2.24.2 the log shows alternating BasecallsConverter Read and Write INFO lines, but with version 2.25.2 there are only SortedBasecallsConverter Read INFO lines.

tomkinsc commented 3 years ago

We've also been seeing this issue with Picard v2.25.2 when demultiplexing NovaSeq 6000 runs via IlluminaBasecallsToSam. The demux metrics do show plenty of PF reads, but the resulting uBAM files contain only a header.

As an example:

Invocation (via cromwell running a WDL workflow on Terra): picard -Xmx161052m -Djava.io.tmpdir=/cromwell_root/tmp.aa6ff4d2/tmp-illumina-illumina_demux-u0cpeccu IlluminaBasecallsToSam READ_STRUCTURE=101T10B10B101T ADAPTERS_TO_CHECK=PAIRED_END ADAPTERS_TO_CHECK=NEXTERA_V1 ADAPTERS_TO_CHECK=NEXTERA_V2 MAX_RECORDS_IN_RAM=2000000 NUM_PROCESSORS=32 INCLUDE_NON_PF_READS=False COMPRESSION_LEVEL=5 SORT=True RUN_START_DATE=10/25/2020 SEQUENCING_CENTER=A00764 BASECALLS_DIR=/cromwell_root/tmp.aa6ff4d2/tmp.XEkakOlbFK/seq/illumina_ext/SL-NVQ/201025_SL-NVQ_0282_BHTHC3DRXX/Data/Intensities/BaseCalls BARCODES_DIR=/cromwell_root/tmp.aa6ff4d2/tmp-illumina-illumina_demux-u0cpeccu/extracted_barcodes-97iffr24 LANE=1 RUN_BARCODE=HTHC3DRXX LIBRARY_PARAMS=/cromwell_root/tmp.aa6ff4d2/tmp-illumina-illumina_demux-u0cpeccu/library_params.HTHC3DRXX.1iwv5beh2.txt

Metrics for one of the samples: CCACGCTGAA-TATTCCTCAG CCACGCTGAATATTCCTCAG MA_MGH_02710 MA_MGH_02710.lERCC-00014_RandomPrimer-SSIV_NexteraXT_RNA_0352324857_cDNA_0369438774_LIB_13214803_A8 1690940 1690940 1690940 1690940 0 0 0.003342 0.303341 0.003342 0.303341 0.689321

Resulting bam file contains only a header:

@HD     VN:1.6  SO:queryname
@RG     ID:HTHC3.1      SM:MA_MGH_02710 LB:MA_MGH_02710.lERCC-00014_RandomPrimer-SSIV_NexteraXT_RNA_0352324857_cDNA_0369438774_LIB_13214803_A8  PL:ILLUMINA     PU:HTHC3DRXX.1.CCACGCTGAA-TATTCCTCAG    CN:A00764       DT:2020-10-25T00:00:00+0000
gbggrant commented 3 years ago

Thank you @jamestwebber @akt68 and @tomkinsc We are actively investigating this. As it appears this bug was introduced in 2.25.0 or later, we suggest using 2.24.2 for now.

gbggrant commented 3 years ago

Picard 2.25.3 was released this morning that should resolve this bug. Please let us know if you still have problems regarding this issue.