NBISweden / pipelines-nextflow

A set of workflows written in Nextflow for Genome Annotation.
GNU General Public License v3.0
43 stars 18 forks source link

Workflow refactor #71

Closed mahesh-panchal closed 2 years ago

mahesh-panchal commented 2 years ago

It would be nice to refactor the workflow to the following directory structure:

pipelines-nextflow/
├── conf
│   ├── base.config          // Contains base configuration like in nf-core
│   ├── modules.config       // Contains per process configuration a-la nf-core, so publishDir, ext.args, etc go here.
│   ├── test.config          // Test profile using a minimal test set - output can be nonsense, but must test workflow runs through
│   └── test_full.config     // Test profile using a realistic data set
├── docs
│   ├── output.md            // Output description
│   ├── README.md
│   └── usage.md             // How to
├── lib
│   ├── Template.groovy      // Library of functions to print logo etc
├── main.nf                  // Primary workflow - calls other workflows based on a parameter (e.g. like a subcommand)
├── modules                  // Process definitions
│   ├── local                // custom definitions - where most of our stuff will be until converted to nf-core format.
│   │   └── samplesheet_check.nf
│   └── nf-core              // Existing nf-core modules we can already use - install with `nf-core install <module>`
│       └── modules
│           ├── custom
│           │   └── dumpsoftwareversions
│           │       ├── main.nf
│           │       ├── meta.yml
│           │       └── templates
│           │           └── dumpsoftwareversions.py
│           ├── fastqc
│           │   ├── main.nf
│           │   └── meta.yml
│           └── multiqc
│               ├── main.nf
│               └── meta.yml
├── modules.json
├── nextflow.config         // Base configuration file containing parameter initialisation and standard profiles
├── nextflow_schema.json 
├── README.md 
├── subworkflows            // Workflows used within workflows
│   └── local
│       └── input_check.nf
└── workflows               // Current workflows.
    ├── AbinitioTraining.nf
    ├── AnnotationPreprocessing.nf
    ├── FunctionalAnnotation.nf
    └── TranscriptAssembly.nf

The directory structure follows nf-core template structure, so less effort to port once we use their code.

I’ve added a Gitpod environment if you want to use that to develop. It’s a web based development environment with Nextflow, git, docker, conda, mamba, nf-core, pytest-workflow, and other things installed. There’s around 16 cores, 62GB mem, and ~280GB storage, and the environment is ephemeral ( so make sure you push your changes to your fork/branch ). You can install a Gitpod browser button which adds a button to open Gitpod from the github repo.

First stage is to refactor the code to the following structure.

Here are nf-cores' docs on how to create a new pipeline (https://nf-co.re/tools/#creating-a-new-pipeline) but this will likely be useful later. However, feel free to use it to see how similar the workflow structures are.

Nextflow’s DSL2 docs are https://www.nextflow.io/docs/latest/dsl2.html

My Nextflow coding practices that I wrote for the Carpentries workshop is https://carpentries-incubator.github.io/workflows-nextflow/15-coding_practices/index.html ( I see I need to fix some syntax there so check back later )