NBISweden / AGAT

Another Gtf/Gff Analysis Toolkit
GNU General Public License v3.0
431 stars 52 forks source link

AGAT takes >12 hours on some GFFs on HPC #439

Closed chriswyatt1 closed 3 months ago

chriswyatt1 commented 3 months ago

Hi,

First, thanks for maintaining a great tool. I think maybe I am doing something wrong, but have been using the "agat_sp_keep_longest_isoform.pl" tool, and it finishes 95% of the time within a few minutes, but some gffs are taking more than 12 hours. Maybe this is the expected behaviour, so may not be an issue as such.

For example with Bombus terrestris (~14 hours, once job is running [its not stuck in a hpc queue]): agat_sp_keep_longest_isoform.pl -gff Bombus_terrestris-GCA_910591885.2-2022_03-genes.gff3.gz -o Bombus_terrestris.longest.gff3

The gff can be downloaded here: GFF3

General (please complete the following information):

To Reproduce The script I run is as follows: agat_sp_keep_longest_isoform.pl -gff Bombus_terrestris-GCA_910591885.2-2022_03-genes.gff3.gz -o Bombus_terrestris.longest.gff3 This is my output: out.txt

Juke34 commented 3 months ago

In a standalone way on my laptop your file took ~1 min. So I guess the problem is related to your infrastructure. It might be due to RAM unavailable, AGAT load everything in RAM to work efficiently, if no RAM is available the computer will use the SWAP (HDD) and that can be very slow. It can be also related due concurrency with other job... if CPU is overloaded by different task, or it might be slows down by intense IO if e.g. there is a backup on the system in the same time in background.

chriswyatt1 commented 3 months ago

Ahhh, ok great. I will check out if I can increase the RAM to get this job through. Thanks for your help!

Juke34 commented 3 months ago

To be on the safe side it is good to take 10 times the size of the file. So for a file of 100mb, it is good to have 1GB RAM available.