Illumina / Isaac4

Isaac aligner version 4
Other
17 stars 3 forks source link

Failed to allocate a file handle #17

Open maggietsui opened 2 years ago

maggietsui commented 2 years ago

Hi,

I am testing isaac-align on one WGS sample and this is the command I used

/wynton/home/rotation/mtsui/ASE_project/Isaac4/bin/isaac-align \
    -r ~/ASE_project/GRCh38.d1.vd1.fa \
    -b $WES_DIR/$patient \
    --base-calls-format fastq-gz \
    -o $WES_DIR/$patient \
    -t $TMPDIR \
    -m 40

It appears to run fine and then I get a "Failed to allocate a file handle" error. I included the last lines of the error log here:

2022-01-20 13:31:01     [2b9b51e73cc0]  STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:2/4 6347304960vm 818531res
2022-01-20 13:31:01     [2b9b51e73cc0]  STAT: before ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:3/4 6347304960vm 818531res
2022-01-20 13:31:01     [2b9b51e73cc0]  STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:3/4 6348369920vm 818531res
2022-01-20 13:31:01     [2b9b51e73cc0]  STAT: before ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:4/4 6348369920vm 818531res
2022-01-20 13:31:01     [2b9b51e73cc0]  STAT: ClusterHashMatchFinder::ClusterHashMatchFinder seedsPerMatch:4/4 6349172736vm 818531res
2022-01-20 13:31:01     [2b9b51e73cc0]  STAT: TemplateBuilder before shadowList_.reserve 6360207360vm 818531res
2022-01-20 13:31:01     [2b9b51e73cc0]  STAT: TemplateBuilder before bestCombinationPairInfo_.reserve 6361202688vm 818531res
2022-01-20 13:31:01     [2b9b51e73cc0]  STAT: TemplateBuilder before bestRescuedPair_.reserve 6361202688vm 818597res
2022-01-20 13:31:01     [2b9b51e73cc0]  STAT: TemplateBuilder before candidates_.reserve 6361202688vm 818597res
2022-01-20 13:31:01     [2b9b51e73cc0]  STAT: TemplateBuilder after candidates_.reserve 6361452544vm 818597res
2022-01-20 13:31:01     [2b9b51e73cc0]  STAT: Constructed match selector 6361452544vm 818597res
2022-01-20 13:31:11     [2b9b51e73cc0]  STAT: Constructing ReferenceHasher: for 16-mers  24221159424vm 5023324res
2022-01-20 13:31:36     [2b9b51e73cc0]   a:3308323 b:7048005 buckets:4294967296 and 2942474370 genome 16-mers  unique k-mers found 954415372 unique keys. maxUniqueKeys:954415372
2022-01-20 13:31:42     [2b9b51e73cc0]  STAT:  reserving memory done for 2942474370 positions 35991060480vm 8052378res
2022-01-20 13:32:04     [2b9b51e73cc0]   generated 2942474370 positions
2022-01-20 13:32:16     [2b9b51e73cc0]   sorted 2942474370 positions
2022-01-20 13:32:16     [2b9b51e73cc0]  STAT: AlignWorkflow::selectMatches  35312574464vm 7886759res
2022-01-20 13:32:16     [2b9b51e73cc0]  Selecting matches using BinIndexMap(10000bl)
2022-01-20 13:32:16     [2b9b51e73cc0]  STAT: before buildBinPathList 35312574464vm 7886786res
2022-01-20 13:32:18     [2b9b51e73cc0]  found 1757 unique bin paths in 312770 binsError: 2022-Jan-20 13:32:21: Too many open files: /wynton/home/rotation/mtsui/ASE_project/Isaac4/src/c++/include/io/FileBufWithReopen.hh(56): Throw in function isaac::io::basic_FileBufWithReopen<_CharT, _Traits>::basic_FileBufWithReopen(const isaac::io::basic_FileBufWithReopen<_CharT, _Traits>&) [with _CharT = char; _Traits = std::char_traits<char>]
Dynamic exception type: boost::exception_detail::clone_impl<isaac::common::IoException>
std::exception::what: Failed to allocate a file handle
: Failed to allocate a file handle

Any suggestions would be appreciated, thanks!!

Maggie

rpetrovski commented 2 years ago

Can you please post the output of: ulimit -a command

maggietsui commented 2 years ago

Thanks for the quick response!

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 2063554
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
rpetrovski commented 2 years ago

It appears that you have standard limit of 1024 for simultaneously open files. Isaac needs more to be able to store data temporarily on the disk.

You need to increase the ulimit -n. I would recommend going to 10240 to make sure the issue does not occur on bigger data set. You might need to have administrative privileges on the box to alter this limit.

R.

maggietsui commented 2 years ago

Thank you so much, that tip worked and my test sample ran successfully. However, I think there is a problem with my command, as when I tried to run all of my ~100 samples on my cluster, the jobs were using way more CPU cores than I specified. A sysadmin reached out to me:

"Your jobs on Wynton (job ID 570950) were using far more CPU cores than the 8 requested by each task. The process "isaac-align" looked to be trying to use every core in whatever node it was running on. "

Here are the SGE submission parameters I used

#$ -cwd
#$ -pe smp 8
#$ -l mem_free=8G
#$ -l scratch=50G 
#$ -l h_rt=48:00:00
#$ -t 1-100
#$ -e /wynton/scratch/mtsui/WASP_step2_logs
#$ -o /wynton/scratch/mtsui/WASP_step2_logs

and here was the isaac align command:

/wynton/home/rotation/mtsui/ASE_project/Isaac4/bin/isaac-align \
    -r ~/ASE_project/GRCh38.d1.vd1.fa \
    -b $WES_DIR/$patient \
    --base-calls-format fastq-gz \
    -o $WES_DIR/$patient \
    -t $TMPDIR \
    -m 60 

is there a parameter that I should be using to limit the amount of cores used? Would adding --jobs 8 fix this issue?

rpetrovski commented 2 years ago

Yep. That's what it does. It was designed to take maximum advantage of the hardware. The -j will limit the number of threads that do heavy compute, but it will not have effect on idle threads or those that are designated for io. So it will allocate more.

The -j is mainly to debug for hardware bottlenecks. Not sure if it will help you make sysadmins happy.

Best strategy is to submit one isaac-align process per node and let it use the entire box. You can also play with multiple samples per run using --sample-sheet parameter. This would save time if the individual samples are small relatively to the startup cost.

Another tip is to make sure --temp-directory points to a fast low-latency local storage such as SSD.

R.