Open jlw-ecoevo opened 2 years ago
We don't have an option to control the mem usage. Do you know in which step METABOLIC takes such a high mem? I might be able to adjust this
Hi! I believe it was in both the KEGG Ortholog steps and dbcan steps (I ended up cancelling the job during the dbcan step).
Hi! The two steps involved with hmmsearch and hmmscan. I found this: Since HMMER 4 is still in development, it seems that that's it. Maybe reducing the CPU thread num is the only way
Ah ok thanks - might be worth warning users about this in the docs - wasn't expecting the memory load to be so high
Sure. I will add this info in the GitHub
Also - are you 100% sure it's hmmsearch's fault? The first hmmsearch step seems to run without any issues " The hmmsearch is running with 40 cpu threads..." and each hmmsearch process takes very little RAM (I've never run into memory issues w/ hmmsearch and have often ran it on many more cores for big jobs). The processes that seem to be taking up a lot of RAM are perl processes in the later KEGG/dbCAN steps after the hmmsearch step has finished. For reference this is running METABOLIC-G.pl on a folder with aa sequences from around 9k genomes. Thanks!
Hi, after hmmsearch, we will use a hash to store all the hit names for each genome. If the input genome number is very big, then the hash will be very big too. Maybe this is the reason. By the way, it will take a very long time to process 9k genomes. It is suggested that you can divide them into 2k-genome containing batches
Is there a way to set a memory cap for metabolic? Is it sufficient just to reduce the number of cores used? (I was running w/ 40 cores and it ate up 1TB of RAM, which was unexpected and a little unkind to my lab mates)