Microbial-Ecology-Group / AMRplusplus

AMR++ is a bioinformatic pipeline meant to aid in the analysis of raw sequencing reads to characterize the profile of antimicrobial resistance genes, or resistome.
https://www.meglab.org/
GNU General Public License v3.0
27 stars 8 forks source link

The pipeline seems not really supports nextflow and slurm. #35

Open nttg8100 opened 4 months ago

nttg8100 commented 4 months ago

I checked at your pipeline and I figure out that there are a few things seem to not be worked as expected:

  1. Run with nextflow: When we login the the HPC, at the login node, we do not need to submit job manually, nextflow will help use to handle that. We just need to add the executor="slurm". In a process, we can check at the work directory cache dir. In this dir, we have hidden files, you can check .command.run. Where this file take responsibility for submitting the sbatch mode with retries strategy in nextflow as well. We do not need to submit manually.
  2. Run with environment conda: You write the process but not add the conda <yaml file> on the body of the process. For reference, we can take a look at this https://github.com/nf-core/ampliseq/blob/master/modules/local/assignsh.nf#L5C5-L5C38. It has the correct conda configuration. As a result, the nextflow will automatically create the environment using the yaml file, then it will activate the conda when it runs the relative process. Beside, I highly recommend that you can use the specific tools on each env. Currently, the pipeline requires to activate all of the tools in a single env manually
  3. Process cpus and memory: In the process, you can add directly the cpus=4 and memory=8.GB directly instead of using a single job. If a process that requires small cpus and memory, it will not optimize the allocated resources.