Closed matthdsm closed 3 years ago
Full stacktrace with a bunch of different errors: stderr.txt This includes logs for both tools
Hi @matthdsm what is your ulimit set at?
Hi George,
Thanks for your reply. Currently, the ulimit is set to 32000, but we've went as high as 64000 without any changes.
M
Hi @matthdsm from the key developer on this:
echo "* soft nofile 64000" >> /etc/security/limits.conf echo "* hard nofile 128000" >> /etc/security/limits.conf sysctl -w fs.file-max=128000 sysctl -p
64k soft, 128k hard
Also, if you are running a in a container you need to make sure the docker daemon includes the same options:
echo "OPTIONS=\"--default-ulimit nofile=64000:128000\" " > /etc/sysconfig/docker
Hi George, Sadly that won't work, both the hard and softlimit is already set to a 10 fold of what you suggest.
Matthias
@matthdsm this is strange. A couple more questions / suggestions: 1) What sequencer are you using? 2) How many libraries?
Could you try running something like: watch -n 10 ls -l /proc/$$/fd
to watch the open file handle count as you are running these tools.
Also, just before running each, can you cat /proc/sys/fs/file-max
- just to verify that the ulimit is set properly.
We're using a NovaSeq600 with S1 flowcells (for now). Typically there are 2-3 libraries on both lanes. We would like to expand this to S4 flowcells with multiple libraries containing several hundred samples.
I'll go ahead and try to monitor the open filehandles on our next run.
Thanks M
$ sudo cat /proc/sys/fs/file-max
6550835
ExtractIlluminaBarcodes
starts throwing errors at 16377 open fh's.
The runtime also seems to have worsened since in comparison to v2.25.6. ExtractIlluminaBarcodes
used to run in 10 min with 30 threads on v2.25.6 , while it took 43 min with 70(!) threads on v2.26.1
I've added the output here for reference
Thanks for looking into this. M
When localising the cbcl files, the open fh's hover at around 2K files and the process finished without errors in about 3 min (v2.26.1). Same thing for v2.25.6, but then it finishes in about 2min.
IlluminaBasecallsToSam
(v2.26.1) also finishes successfully, although it took forever (1h50m runtime on 70 threads). The open fh hover around 22K for the entire process.
Keeping all this in mind, I can surmise that the too many open files
error was because the basecalls were read from a network share instead of local storage
PS: all of the number above are for only one lane, so double that for the complete flowcell
Yes, that makes a lot of sense. Maybe NFS has its own file limit, or has more overhead with regard to keeping track of file usage.
Definitely is going to be a lot more performant doing EIB and IlluminaBaseCallsToSam on local storage.
You might also want to look at using IlluminaBaseCallsToSam new MATCH_BARCODES_INLINE
parameter - this adds inline demultiplexing so you should be able to get around using the EIB call.
I didn't know the MATCH_BARCODES_INLINE
option even existed! This seems a very useful addition, since it eliminates the need using both a barcodes
and a library_params
file. Are you open for feature requests (in a separate issue)? Currently we use the barcodes file from EIB to get the most frequent undetermined barcodes. It would be useful if these were added to the barcode metrics file output (cfr bcl2fastq stats).
M
I've opened a new issue (#1722) with my question. This issue seems resolved to me, since it was caused mostly by FS limitations.
Thanks for all the help! Matthias
Bug Report
Affected tool(s)
ExtractIlluminaBarcodes IlluminaBasecallsToSam
Affected version(s)
Description
When running
ExtractIlluminaBarcodes
andIlluminaBasecallsToSam
the tool fails intermittently withtoo many open files
error. This issue existed in v.2.25.6, but seems to have become consistent in v2.26.0. Raising the user ulimit does not help, nor does settingsMAX_RECORDS_IN_RAM
to a higher value.Steps to reproduce
Run either tool with mostly default params
Expected behavior
Tools should finish without error
Actual behavior
Tools fail with error
too many open files
First mentioned in https://github.com/broadinstitute/picard/issues/1699#issuecomment-899542458 ping @caherzee, this is the one we discussed at the meeting last week.
I'll provide a more detailed stacktrace shortly.
M