NBISweden / IgDiscover-legacy

Analyze antibody repertoires and discover new V genes from high-throughput sequencing reads
https://www.igdiscover.se
MIT License
17 stars 10 forks source link

Set the number of thread and disbale .cache #83

Closed prod-feng closed 6 years ago

prod-feng commented 6 years ago

Hi,

We are running IgDiscover on our High Performance Computing cluster, the cluster is shared by many users. We have Sun Grid Engine(SGE) as job management system.

The issue is: it seems there's no way to set the number of threads for each job easily.

For example, I want to run a 8 threads job on one of our compute nodes as an multi-thread job. And SGE can do that easily. The issue is that IgDiscover by default will use all of the CPUs(Cores). Which causes big issue for the cluster, by overloading the node it is running on(for example, there are other users jobs running on it. Even worse, if I run 100 IgDiscover jobs at the same time).

After digging the codes, it seems IgDiscover takes Snakefile: {somedir}/site-packages/igdiscover/Snakefile, as the config to control the data flows and processes. And only from it one can change and set the number of threads there. It is fine. While considering running IgDiscover on HPC cluster, it will be nice to be able to control each job conveniently, jobA has 12 threads, jobB I want 20 threads for it, etc. It may be better to let IgDiscovre first read a "Snakefile" in the current directory of the job, then finally read the somedir/site-packages/igdiscover/Snakefile(or just simply skip it), or any way like it, first read in will take effect, or vice versa.

Also, there seems no way to disble the ~/.cache folder for IgDiscover. On HPC cluster environment, since it can easily generate 100s Thousands of files of 100s GB in size on users's HOME folder, which may be not a good way. Add a switch in IgBlast(also in Snakefile) to let user be able to disable it, which may be nice solution.

Best

Feng

MartinMatthewC commented 6 years ago

Hi Feng, Thanks for this information. We have been working on one of these issues already (the cache issue) and it will be included in the next updated version of the program. For the other one we will investigate how we can mitigate this problem - either in an update or whether there is some simple config parameters that can be used within the normal running of the program. Best regards

Martin


From: Feng Zhang [notifications@github.com] Sent: Thursday, August 02, 2018 6:29 PM To: NBISweden/IgDiscover Cc: Subscribed Subject: [NBISweden/IgDiscover] Set the number of thread and disbale .cache (#83)

Hi,

We are running IgDiscover on our High Performance Computing cluster, the cluster is shared by many users. We have Sun Grid Engine(SGE) as job management system.

The issue is: it seems there's no way to set the number of threads for each job easily.

For example, I want to run a 8 threads job on one of our compute nodes as an multi-thread job. And SGE can do that easily. The issue is that IgDiscover by default will use all of the CPUs(Cores). Which causes big issue for the cluster, by overloading the node it is running on(for example, there are other users jobs running on it. Even worse, if I run 100 IgDiscover jobs at the same time).

After digging the codes, it seems IgDiscover takes Snakefile: {somedir}/site-packages/igdiscover/Snakefile, as the config to control the data flows and processes. And only from it one can change and set the number of threads there. It is fine. While considering running IgDiscover on HPC cluster, it will be nice to be able to control each job conveniently, jobA has 12 threads, jobB I want 20 threads for it, etc. It may be better to let IgDiscovre first read a "Snakefile" in the current directory of the job, then finally read the somedir/site-packages/igdiscover/Snakefile(or just simply skip it), or any way like it, first read in will take effect, or vice versa.

Also, there seems no way to disble the ~/.cache folder for IgDiscover. On HPC cluster environment, since it can easily generate 100s Thousands of files of 100s GB in size on users's HOME folder, which may be not a good way. Add a switch in IgBlast(also in Snakefile) to let user be able to disable it, which may be nice solution.

Best

Feng

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/NBISweden/IgDiscover/issues/83, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIUaLYZICUTSktWaDUPMqnLaBCDrif2aks5uMyjogaJpZM4Vsn2q.

marcelm commented 6 years ago

Hi, yes, the cache will be disabled by default in the next IgDiscover version. There will be a configuration option to re-enable it, but I expect this to be mostly useful for developers.

Regarding the number of threads that are used, please use igdiscover run -j N, where N is the number of cores that you want to use, see also igdiscover run --help. There should be no need to modify the Snakefile.