DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
478 stars 119 forks source link

Enabling a --temp-directory parameter #438

Open pasviber opened 3 months ago

pasviber commented 3 months ago

Hi,

I am working on starting a nextflow pipeline (v24.04.3) which among many other programs makes use of HISAT2 (v2.2.1) from a singularity container (v3.11.1).

During the execution of the pipeline in a High Performance Computing cluster, the pipeline has stopped in an alignment process with HISAT2 due to the following error:

(ERR): mkfifo(/tmp/44.inpipe1) failed.
Exiting now ...

I have done all the checks related to space and permissions on the /tmp folder and everything is fine.

However, it seems that if I have a total of 50 hisat2 tasks running at the same time in the cluster it can happen that tasks that are in the same node generate the same /tmp/$$.inpipe1. That is, if there is a task that has generated /tmp/44.inpipe1 and /tmp/44.inpipe2, another task tries to create /tmp/44.inpipe1 and returns the error.

This happens in the case I show below:

JobID           JobName        QOS    Planned               Start                 End      User    Elapsed  ReqCPUS  AllocCPUS     ReqMem     MaxRSS   TotalCPU      State ExitCode        NodeList 
------------ ---------- ---------- ---------- ------------------- ------------------- --------- ---------- -------- ---------- ---------- ---------- ---------- ---------- -------- --------------- 
604005       nf-ALIGNM+      short   00:00:00 2024-08-05T16:36:08 2024-08-05T16:57:32  pasviber   00:21:24        6          6         6G              02:01:32  COMPLETED      0:0            cn02 
604005.batch      batch                       2024-08-05T16:36:08 2024-08-05T16:57:32             00:21:24        6          6              2072172K   02:01:32  COMPLETED      0:0            cn02 
606200       nf-ALIGNM+      short   00:00:01 2024-08-05T17:11:19 2024-08-05T17:41:11  pasviber   00:29:52        6          6         6G              02:48:42  COMPLETED      0:0            cn02 
606200.batch      batch                       2024-08-05T17:11:19 2024-08-05T17:41:11             00:29:52        6          6              2158316K   02:48:42  COMPLETED      0:0            cn02 
606206       nf-ALIGNM+      short   00:00:00 2024-08-05T17:20:58 2024-08-05T17:20:59  pasviber   00:00:01        6          6         6G             00:00.726     FAILED     17:0            cn02 
606206.batch      batch                       2024-08-05T17:20:58 2024-08-05T17:20:59             00:00:01        6          6                     0  00:00.726     FAILED     17:0            cn02 

These three jobs are HISAT2 alignment processes executed on node cn02. The first job (604005) does not coincide in time with any of the other two jobs. However, during the execution of job 606200 which creates /tmp/44.inpipe1 and /tmp/44.inpipe2 in the /tmp folder of node cn02, job 606206 is launched on the same node and tries to create in the /tmp folder of node cn02 /tmp/44.inpipe1. As that /tmp/44.inpipe1 already exists because of job 606200, the cluster throws the error I mentioned at the beginning.

This problem was also seen with centrifuge (https://github.com/DaehwanKimLab/centrifuge/issues/268) and was solved by giving the possibility to modify the directory where the temporary files are stored through a --temp-directory parameter. This parameter would allow to create a folder with the name of the sample being aligned and save inside that folder the inpipe1 and inpipe2 of that process without the possibility of matching it with another process.

@imzhangyun, is it possible to add the --temp-directory parameter?

Another possibility would be that the hisat2 code would generate temporary folders with unique names in /tmp that would be deleted at the end of the execution and that would allow to isolate the inpipe of each process avoiding the problem of possible repeated inpipes. This would also solve the problem.

Thank you in advance :)

Pascual

RaqManzano commented 1 month ago

Hi, having the same issue. It would be great to check out the proposed solution, it looks simple enough and this issue really slow things down at bigger scale. Thanks @pasviber for making the detective work.

mourisl commented 1 week ago

I have merged the update to the master branch. Could you please give it a try?

pasviber commented 1 week ago

Hi @mourisl, thank you very much for the addition of the --temp-directory parameter. I have tested it and it works. I would just add the option to the help message so that the community can use it in case they need it.