arpcard / rgi

Resistance Gene Identifier (RGI). Software to predict resistomes from protein or nucleotide data, including metagenomics data, based on homology and SNP models.
Other
319 stars 76 forks source link

Controlling number of tmp files generated in hpc environment #229

Closed Amjadhpc closed 11 months ago

Amjadhpc commented 1 year ago

Hello I am using rgi-main and passing --clean option with num-threads of 64 and --split-prodigal-jobs on a gpfs storage

This is causing lots of tmp files being generated and the inodes on gpfs strorage are running out.

My question is does the clean option only works after the run is completed and there is not an option to remove tmp files in middle of runs.

Also is there an option that can limit number of tmp files generated ?

Thanks

sophieleech commented 1 year ago

I am also having the same issue and exceeding my disc quota due to the number of temp files generated on an hpc system

raphenya commented 1 year ago

@Amjadhpc @sophieleech I will test this and update you on why the temp files are not being removed.

Amjadhpc commented 1 year ago

Hi @raphenya any update on this?

raphenya commented 11 months ago

@Amjadhpc @sophieleech The temporary files are removed for each input file at the end of a run. So, having lots of input files will cause problems as the prodigal step is slow.

TomasaSbaffi commented 10 months ago

I also had this issue and solved it by dividing the multi-fasta assembly files into smaller chunks of 50k sequences each. Any tool will do the job, I used seqkit split2 (-s 50000).