Open GATKSupportTeam opened 2 years ago
@gbggrant this error came up on the GATK Forum. Is there anything going wrong with ExtractIlluminaBarcodes that it is opening 120000 files? This user has a limit of 100000. Here they have already tried increasing --MAX_RECORDS_IN_RAM.
We've seen some reports of this, I believe that Fulcrum Genomics (who submitted some recent changes on this code) are looking into it.
This request was created from a contribution made by Robert Altwasser on April 19, 2022 10:09 UTC.
Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/5461192217627-Picard-Too-many-open-files-
--
I am demultiplexing a S4 sequencing run and
Picard
ExtractIlluminaBarcodes opens to many files which crashes the run. It's dual index data with UMIs and I need unmapped BAM files with the umi sequence. I checked the MD5sum of the raw data several times and I also run a check on the Basecall dir.I monitored the open files of the process with
'lsof'
and it quickly exceeds 120000 files, which is the maximum that I can set with'ulimit -n'
.Here is the RunInfo:
a) Versions:
The Genome Analysis Toolkit (GATK) v4.2.5.0
HTSJDK Version: 2.24.1
Picard Version: 2.25.4
Java: openjdk version "1.8.0_312"
b) Exact command used:
(bash) $ ulimit -n 100000
picard -Xmx110g -Djava.io.tmpdir=/data/gpfs-1/users/altwassr_c/scratch/tmp/ -Xms110g \
ExtractIlluminaBarcodes \
-B /data/gpfs-1/users/altwassr_c/scratch/data/220325_A00643/Data/Intensities/BaseCalls/ \
-L 1 \
--NUM_PROCESSORS 1 \
-M metrices/barcode_metrices1.txt \
-BARCODE_FILE /data/gpfs-1/users/altwassr_c/work/projekte/barcode1.csv \
-RS 148T8B9M8B148T \
--MAX_RECORDS_IN_RAM 1000000000 \
--TMP_DIR /data/gpfs-1/users/altwassr_c/scratch/tmp/
c) Log: ``
ERROR 2022-04-19 04:41:06 ExtractIlluminaBarcodes Error processing tile 2140
picard.PicardException: File not found: (/data/gpfs-1/users/altwassr_c/scratch/data/220325_A00643_0438_BH22YTDSX2/Data/Intensities/BaseCalls/L002/C237.1/L002_1.cbcl)
at picard.illumina.parser.readers.BaseBclReader.open(BaseBclReader.java:93)
at picard.illumina.parser.readers.CbclReader.readHeader(CbclReader.java:127)
at picard.illumina.parser.readers.CbclReader.readTileData(CbclReader.java:200)
at picard.illumina.parser.readers.CbclReader.advance(CbclReader.java:275)
at picard.illumina.parser.readers.CbclReader.hasNext(CbclReader.java:252)
at picard.illumina.parser.NewIlluminaDataProvider.hasNext(NewIlluminaDataProvider.java:125)
at picard.illumina.ExtractIlluminaBarcodes$PerTileBarcodeExtractor.run(ExtractIlluminaBarcodes.java:363)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: /data/gpfs-1/users/altwassr_c/scratch/data/220325_A00643_0438_BH22YTDSX2/Data/Intensities/BaseCalls/L002/C237.1/L002_1.cbcl (Too many open files)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at picard.illumina.parser.readers.BaseBclReader.open(BaseBclReader.java:90)
... 11 more
INFO 2022-04-19 04:41:06 ExtractIlluminaBarcodes Extracting barcodes for tile 2141
ERROR 2022-04-19 04:41:06 ExtractIlluminaBarcodes Error processing tile 2141
picard.PicardException: Unrecognized data type(Cbcl) found by IlluminaDataProviderFactory!
at picard.illumina.parser.IlluminaDataProviderFactory.makeParser(IlluminaDataProviderFactory.java:400)
at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:249)
at picard.illumina.parser.IlluminaDataProviderFactory.makeDataProvider(IlluminaDataProviderFactory.java:228)
at picard.illumina.ExtractIlluminaBarcodes$PerTileBarcodeExtractor.run(ExtractIlluminaBarcodes.java:355)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
(created from Zendesk ticket #281653)
gz#281653