BinPro / CONCOCT

Clustering cONtigs with COverage and ComposiTion
Other
120 stars 48 forks source link

By default, use all available threads #233

Closed alneberg closed 4 years ago

alneberg commented 5 years ago

Currently, the default is to use 1 thread. This can lead to unnecessary long execution times by mistake.

jvollme commented 5 years ago

Personally, I think this is the only acceptable default. This way, a user that made a mistake or did not read the manual only messes up his or her own calculation time (inconvenient but not destructive) and at least does not mess with everybody else on a public server.

Programs that use all cores by default often create massive problems on public servers, because someone always forgets to set a parameter or read the docs.

alneberg commented 5 years ago

Hmm, I guess you have a point. But I guess most users are using HPC-clusters where the core usage/access is already managed, don't you think? On an HPC-cluster, users would risk running concoct for a week before realising it has only been using a few percentage of the available cores. That is a scenario I am more worried about.

jvollme commented 5 years ago

Well, I am neither a developer nor a sys-admin. But it seems that the queuing systems on HPC-clusters are not always safe in this regard. In my former Institute there were sometimes problems with people submitting jobs specifying just 1 core, but then taking up multiple cores. The queuing system (at least in the form it was implemented) used that info to reserve/schedule calculation time for different jobs, but did not check how many cores were then actually used, resulting in quite high loads and messing with everybody elses jobs running on that server.

For that reason I tended to curse when tools had thread/core default values taking up all available cores.

alneberg commented 5 years ago

I see! I guess that would count as a bug in the queuing system though? I would consider your alternative more if concoct would be useful in single-threaded mode, but for realistic input data sizes, one really need to use parallelism. I would feel better if the defaults were at least useful...

jvollme commented 5 years ago

In that case it would perhaps be better not to set a default for that parameter at all (making it a required argument)? That way jobs would immediately fail if users forgot to set the parameter correctly (saving them from long waiting times before they realize their mistake), but would still protect people on servers without or not sufficiently safe job queuing systems...