AVR-biosecurity-bioinformatics / freyr

A Nextflow-based metabarcoding pipeline for agricultural biosecurity and biosurveillance
0 stars 0 forks source link

Rework `output` directory #12

Open jackscanlan opened 1 month ago

jackscanlan commented 1 month ago

./output directory currently has the following structure (after a run):

output
├── logs
│   ├── K739J
│   └── K77JP
├── modules
│   ├── assignment_plot
│   ├── dada_mergereads
│   ├── dada_priors
│   ├── denoise
│   ├── error_model
│   ├── filter_qualplots
│   ├── filter_seqtab
│   ├── joint_tax
│   ├── merge_tax
│   ├── parse_inputs
│   ├── phyloseq_filter
│   ├── phyloseq_merge
│   ├── phyloseq_unfiltered
│   ├── primer_trim
│   ├── read_filter
│   ├── read_tracking
│   ├── split_loci
│   ├── tax_blast
│   ├── tax_idtaxa
│   ├── tax_summary
│   └── tax_summary_merge
├── rds
├── results
│   ├── filtered
│   └── unfiltered
└── temp

logs, rds, results and temp don't currently get used by the pipeline. modules is used to save all output channel files from every process based on module name, but this was largely used during initial development to easily check outputs of each process without diving into the work directory, and it's not easy for users to find relevant output files unless they know the structure of the pipeline well.

First thoughts:

  1. Remove logs, temp and rds
  2. Create a directory (maybe called rdata or rda) to store the .rda files produced when --rdata true is used, so don't have to dive into work directories; maybe create this directory dynamically during the run if --rdata true
  3. Use results to save filtered and unfiltered output files like the original pipeline, but also have folders for the QC plots
  4. Have a pipeline_info folder like nf-core pipelines that contains the trace, DAG, report and timeline files all about the pipeline execution