genomicsITER / NanoCLUST

NanoCLUST is an analysis pipeline for UMAP-based classification of amplicon-based full-length 16S rRNA nanopore reads
MIT License
104 stars 47 forks source link

Using slurm instead of local #31

Closed Thomieh73 closed 5 months ago

Thomieh73 commented 3 years ago

Hi, I have forked this repo and in order to get it to work on our hpc cluster, I would like to implement slurm usage in the pipeline. But I see that some steps in the pipeline want to download things from the internet. On our cluster it not possible to download things once a slurm job is running. It is only possible on the login nodes. So I think of deciding that for each step in the pipeline (a bit messy, I think)

But in order to do that, I think I need to modify the nextflow.config file further.

A suggestion would be to create something like this, with the profiles: test, conda, docker and singularity. I have wrapped the processes just for clarity Singularity is used on our cluster, so for me it makes sense to include the slurm configuration in the singularity process, by pointing to a slurm configuration file. like this: includeConfig 'conf/slurm.config' But it might also be possible to make another profile called slurm, which includes 1) what the slurm configuration is and 2) which processes should use slurm to run. And then run the workflow with the profile settings slurm,singularity, just like test,singularity.

One issue with slurm is that you have to indicate an account. That has to be modified by users that would like to use it on a different platform. So if this is added, then I also need to add a little section on slurm to the main readme documentation.

Let me know what you think

profiles {
  test { includeConfig 'conf/test.config' }
  conda {
    process {
      withName: demultiplex { conda = "$baseDir/conda_envs/demultiplex/environment.yml" }
      ...
      withName: output_documentation { conda = "$baseDir/conda_envs/output_documentation/environment.yml" }
    }
  }
  docker {
    docker.enabled = true
    //process.container = 'nf-core/nanoclust:latest'
    process {
      withName: demultiplex { container = 'hecrp/nanoclust-demultiplex' }
      ...
      withName: output_documentation { container = 'hecrp/nanoclust-output_documentation' }
    }
    }
    singularity {
      includeConfig 'conf/slurm.config' ### Pointer to slurm config file.
      singularity.enabled = true
      singularity.autoMounts = true
      //process.container = 'nf-core/nanoclust:latest'
      process {
        withName: demultiplex { container = 'docker://hecrp/nanoclust-demultiplex' }
        ...
        withName: output_documentation { container = 'docker://hecrp/nanoclust-output_documentation' }
      }
      }
}
larssnip commented 3 years ago

I would be very interested in this! I am trying to make nanoclust run on a SLURM HPC, with no success so far.

The conda profile seems to have plenty of problems based on the issues here. I would very much like to use singularity, as docker is not an option on my system. What do I need to edit in the nextflow.config file in order to use singularity? I tried to add singularity using the hints above, but how would the slurm.config file look like?

Thomieh73 commented 3 years ago

Hi Lars, Not sure if you still need it, but I answer anyway.

I have forked the original repo to my own account here: https://github.com/Thomieh73/NanoCLUST.

if you fork that repo, then you should be almost ready to use this with singularity and slurm.

If you want to implement singularity yourself, then you have to take a look at the file: nextflow.config on my repo.

for implementation of slurm, you will need to add a profile file with slurm settings, as I created in the folder conf of my repo.

When you have done that, than you should be able to run nanoclust on your cluster with the command:

nextflow run main.nf -profile slurm,singularity --reads 'YOUR_FASTQ.GZ' --outdir 'YOUR_RESULTS' --db "db/16S_ribosomal_RNA" --tax "db/taxdb/" -work-dir YOUR_WORK_DIR --min_read_length 1300 -resume
Thomieh73 commented 3 years ago

By the way the slurm config file on my repo needs to be adjusted to work on your cluster. But that should be easy.

larssnip commented 3 years ago

Dear Thomas,

Thanks for this, I will certainly try this out!

LS

From: Thomas H.A. Haverkamp @.> Sent: tirsdag 6. juli 2021 10:46 To: genomicsITER/NanoCLUST @.> Cc: Lars-Gustav Snipen @.>; Comment @.> Subject: Re: [genomicsITER/NanoCLUST] Using slurm instead of local (#31)

By the way the slurm config file on my repo needs to be adjusted to work on your cluster. But that should be easy.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/genomicsITER/NanoCLUST/issues/31#issuecomment-874577978, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADJZZFTO5XIS6FMCV7NDSHDTWK7CXANCNFSM4YAY5Z7A.