emo-bon / MetaGOflow

MGnify oriented implementation for the Marine Genomic Observatories oriented pipeline, developed in the framework of an EOSC-Life funded project
https://metagoflow.readthedocs.io
Apache License 2.0
7 stars 8 forks source link

diamond CPU overload #23

Open hariszaf opened 1 year ago

hariszaf commented 1 year ago

Diamond asks for 5 * causing overload.

Thorough description of its behavior could benefit the wf.

jprmachado commented 1 year ago

https://github.com/emo-bon/MetaGOflow/blob/d8499a708825f03aa90aff520433c294c06559a6/tools/Assembly/Diamond/Diamond.blastp.cwl#L105

Currently is using the number of threads from config.yaml, guess it could be refined using a new parameters instead of the threads

cymon commented 9 months ago

From my observations: each chunk gets a diamond job with config.yml:threads allocated, e.g. on a 32 thread machine each of 5 chunks will each be allocated 32 threads - massively oversubscribing. But I dont think they run concurrently, at least analysis of one chunk didnt seem to start until one of the other 4 chunks had finished. I'm not sure about this the actual behaviour of cwl-runner/the linux scheduler, or just an artefact of the logging. If each chunk is allocated a total_num_of_threads/number_of_chunks number of cpus (32/5 in previous example), all diamond chunk analyses seem to run concurrently.