broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
983 stars 368 forks source link

UnsortedBasecallsConverter java.lang.NullPointerException #1696

Open kylevoyto opened 3 years ago

kylevoyto commented 3 years ago

Bug Report

Affected tool(s)

IlluminaBasecallsToSam

Affected version(s)

Description

I get a java.lang.NullPointerException when I try to run IlluminaBasecallsToSam --SORT false.

Steps to reproduce

Here is the full command:

java -Xmx220g -Dpicard.useLegacyParser=false -jar \
  /igmprimary/bin/picard.jar IlluminaBasecallsToSam \
  --BASECALLS_DIR /path/to/run/Data/Intensities/BaseCalls \
  --LIBRARY_PARAMS library_params.txt \
  --BARCODES_DIR barcodes \
  --LANE 2 \
  --READ_STRUCTURE 151T8B9M8B151T \
  --INCLUDE_NON_PF_READS false \
  --IGNORE_UNEXPECTED_BARCODES true \
  --RUN_BARCODE run_barcode \
  --READ_GROUP_ID run_barcode.2 \
  --MOLECULAR_INDEX_TAG RX \
  --NUM_PROCESSORS 8 \
  --SEQUENCING_CENTER sequencing_center \
  --ADAPTERS_TO_CHECK null \
  --FIVE_PRIME_ADAPTER ACACTCTTTCCCTACACGACGCTCTTCCGATCT \
  --THREE_PRIME_ADAPTER AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
  --SORT false \
  --VERBOSITY DEBUG

I can provide more information, if needed.

Expected behavior

Run without error.

Actual behavior

20:20:29.844 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/igmprimary/bin/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Jul 07 20:20:29 UTC 2021] IlluminaBasecallsToSam --BASECALLS_DIR /path/to/run/Data/Intensities/BaseCalls --BARCODES_DIR barcodes --LANE 2 --RUN_BARCODE run_barcode --READ_GROUP_ID run_barcode.2 --SEQUENCING_CENTER sequencing_center --READ_STRUCTURE 151T8B9M8B151T --LIBRARY_PARAMS library_params.txt  --FIVE_PRIME_ADAPTER ACACTCTTTCCCTACACGACGCTCTTCCGATCT --THREE_PRIME_ADAPTER AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --NUM_PROCESSORS 8 --INCLUDE_NON_PF_READS false --IGNORE_UNEXPECTED_BARCODES true --MOLECULAR_INDEX_TAG RX --SORT false --VERBOSITY DEBUG --PLATFORM ILLUMINA --INCLUDE_BC_IN_RG_TAG false --APPLY_EAMSS_FILTER true --MAX_READS_IN_RAM_PER_TILE -1 --MINIMUM_QUALITY 2 --MOLECULAR_INDEX_BASE_QUALITY_TAG QX --BARCODE_POPULATION_STRATEGY ORPHANS_ONLY --INCLUDE_BARCODE_QUALITY false --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
[Wed Jul 07 20:20:29 UTC 2021] Executing as user@machine on Linux 4.14.186-110.268.amzn1.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_292-b10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:2.25.6
DEBUG   2021-07-07 20:20:29 BlockCompressedOutputStream Using deflater: IntelDeflater
DEBUG   2021-07-07 20:20:29 BlockCompressedOutputStream Using deflater: IntelDeflater
DEBUG   2021-07-07 20:20:29 BlockCompressedOutputStream Using deflater: IntelDeflater
DEBUG   2021-07-07 20:20:30 IlluminaDataProviderFactory The following file formats will be used by IlluminaDataProvider: 
INFO    2021-07-07 20:20:30 IlluminaBasecallsToSam  DONE_READING STRUCTURE IS 151T8B9M8B151T
INFO    2021-07-07 20:20:31 CbclReader  Processing tile 1101
[Wed Jul 07 20:20:49 UTC 2021] picard.illumina.IlluminaBasecallsToSam done. Elapsed time: 0.32 minutes.
Runtime.totalMemory()=4993843200
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.NullPointerException
    at picard.illumina.UnsortedBasecallsConverter.processTilesAndWritePerSampleOutputs(UnsortedBasecallsConverter.java:93)
    at picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:267)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Commentary

I looked shortly into the code differences between UnsortedBasecallsConverter and SortedBasecallsConverter. My very uneducated guess is that the error might be related to ignoreUnexpectedBarcodes? Thanks for the help.

mjhipp commented 3 years ago

@jacarey I get a similar issue when using an UnsortedBasecallsConverter

This is using a feature branch: #1649

This seems like it may be a non-deterministic race condition, because when I re-ran this workflow with the same input/command, it worked.

WARNING 2021-07-28 13:47:12 IlluminaBasecallsToSam  Setting deprecated parameter `MAX_READS_IN_RAM_PER_TILE` use ` MAX_RECORDS_IN_RAM` instead
13:47:12.486 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Jul 28 13:47:12 PDT 2021] IlluminaBasecallsToSam BARCODES_DIR=/output/analysis/basecalling/barcodes.2 RUN_BARCODE=xxxx_xxxx SEQUENCING_CENTER=Foo RUN_START_DATE=Fri Jul 23 00:00:00 PDT 2021 LIBRARY_PARAMS=/output/analysis/basecalling/library_params.2.txt ADAPTERS_TO_CHECK=[INDEXED, DUAL_INDEXED, NEXTERA_V2, FLUIDIGM, INDEXED, DUAL_INDEXED, NEXTERA_V2, FLUIDIGM] NUM_PROCESSORS=16 MAX_READS_IN_RAM_PER_TILE=300000 INCLUDE_NON_PF_READS=false SORT=false LANE=[2] READ_STRUCTURE=8M1S142T8B8B8M1S142T BASECALLS_DIR=/output/tmp/illumina_runflolder/Data/Intensities/BaseCalls TMP_DIR=[/output/tmp] VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true USE_JDK_DEFLATER=false USE_JDK_INFLATER=false    PLATFORM=ILLUMINA INCLUDE_BC_IN_RG_TAG=false APPLY_EAMSS_FILTER=true IGNORE_UNEXPECTED_BARCODES=false MOLECULAR_INDEX_TAG=RX MOLECULAR_INDEX_BASE_QUALITY_TAG=QX BARCODE_POPULATION_STRATEGY=ORPHANS_ONLY INCLUDE_BARCODE_QUALITY=false MATCH_BARCODES_INLINE=false DISTANCE_MODE=HAMMING MAX_MISMATCHES=1 MIN_MISMATCH_DELTA=1 MAX_NO_CALLS=2 MINIMUM_BASE_QUALITY=0 MINIMUM_QUALITY=2 COMPRESS_OUTPUTS=false VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Wed Jul 28 13:47:12 PDT 2021] Executing as root@ip-xxx.compute.internal on Linux 4.14.232-176.381.amzn2.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_302-b08; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.23.8-59-g7ff3a17-SNAPSHOT
INFO    2021-07-28 13:47:12 IlluminaBasecallsToSam  DONE_READING STRUCTURE IS 8M1S142T8B8B8M1S142T
INFO    2021-07-28 13:47:13 CbclReader  Processing tile 1101
ERROR   2021-07-28 13:47:41 ThreadPoolExecutorWithExceptions    A thread failed:
java.lang.NullPointerException
    at picard.util.AdapterMarker.lambda$tallyAndFixAdapters$0(AdapterMarker.java:207)
    at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
    at picard.util.AdapterMarker.tallyAndFixAdapters(AdapterMarker.java:207)
    at picard.util.AdapterMarker.adapterTrimIlluminaPairedReads(AdapterMarker.java:218)
    at picard.util.AdapterMarker.adapterTrimIlluminaPairedReads(AdapterMarker.java:190)
    at picard.illumina.ClusterDataToSamConverter.convertClusterToOutputRecord(ClusterDataToSamConverter.java:299)
    at picard.illumina.ClusterDataToSamConverter.convertClusterToOutputRecord(ClusterDataToSamConverter.java:52)
    at picard.illumina.UnsortedBasecallsConverter$TileRecordToWriterPump.run(UnsortedBasecallsConverter.java:113)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Exception: picard.PicardException thrown from the UncaughtExceptionHandler in thread "pool-2-thread-10"
INFO    2021-07-28 13:47:44 UnsortedBasecallsConverter  Processed     1,000,000 records.  Elapsed time: 00:00:31s.  Time for last 1,000,000:    2s.  Last read position: */*
INFO    2021-07-28 13:47:46 UnsortedBasecallsConverter  Processed     2,000,000 records.  Elapsed time: 00:00:34s.  Time for last 1,000,000:    2s.  Last read position: */*
[Wed Jul 28 13:47:49 PDT 2021] picard.illumina.IlluminaBasecallsToSam done. Elapsed time: 0.62 minutes.
Runtime.totalMemory()=13145473024
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" picard.PicardException: Exceptions in tile processing. There were 0 tasks still running or queued and they have been cancelled. Errors: java.lang.NullPointerException
    at picard.illumina.BasecallsConverter.interruptAndShutdownExecutors(BasecallsConverter.java:318)
    at picard.illumina.UnsortedBasecallsConverter.awaitTileWriting(UnsortedBasecallsConverter.java:172)
    at picard.illumina.UnsortedBasecallsConverter.processTilesAndWritePerSampleOutputs(UnsortedBasecallsConverter.java:133)
    at picard.illumina.IlluminaBasecallsToSam.doWork(IlluminaBasecallsToSam.java:278)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:308)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)
matthdsm commented 3 years ago

Solved as of v2.26

clintval commented 2 years ago

We still hit this with v2.26.8 of Picard

I do not believe it is solved yet