dahak-metagenomics / dahak

benchmarking and containerization of tools for analysis of complex non-clinical metagenomes.
https://dahak-metagenomics.github.io/dahak
BSD 3-Clause "New" or "Revised" License
21 stars 4 forks source link

What do we need for a 1.0 release #64

Open brooksph opened 6 years ago

brooksph commented 6 years ago

Expected behavior

Actual behavior

Steps to reproduce the behavior

charlesreid1 commented 6 years ago
brooksph commented 6 years ago

written and working: Snakefiles

debugging:

to do: Snakefiles

  1. Read filtering (https://github.com/dahak-metagenomics/dahak/blob/master/workflows/read_filtering/Snakefile)

    • [X] Create rule for fastqc before trim
    • [X] Create rule for fastqc after trim
    • [X] Create rule for trimming with trimmomatic
    • [X] Create rule for combining fastqc reports with multiqc
    • [x] Create config file (maybe yaml or json to specify inputs and outputs)
  2. Assembly (https://github.com/dahak-metagenomics/dahak/blob/master/workflows/assembly/Snakefile)

    • [ ] Split snakefile (MEGAHIT, SPAdes, and MultiQC)
    • [X] Snakefile 1: MEGAHIT assembly
    • [X] Create rule for assembly with MEGAHIT
    • [X] Create rule for assembly evaluation with quast
    • [X] Snakefile 2: SPAdes
    • [X] Create rule for assembly with SPADES
    • [X] Create rule for assembly evaluation with quast
    • [X] Snakefile 3: Multiqc assembly evaluation
    • [X] Create rule for merging quast reports into single report
  3. Mapping and Variant calling (In progress https://github.com/dahak-metagenomics/dahak/issues/46)

  4. Taxonomic classification (see https://github.com/charlesreid1/dahak-flot/blob/master/Snakefile and https://github.com/charlesreid1/dahak-flot/tree/master/rules)

    • [ ] Split snakefile (Sourmash and kaiju)
    • [x] Snakefile 1: Taxonomic classification with sourmash
    • [x] Snakefile 2: Taxonomic classification with kaiju
  5. Functional Inference

    • [ ] Snakefile 1: Functional annotation of reads/contigs with mi-faser
    • [ ] Snakefile 2: Identification of antibiotic resistance genes with ABRicate
    • [ ] Snakefile 3: Identification of antibiotic resistance genes with SRST2
  6. Metagenomic comparison

    • [ ] Snakefile 1: Comparison of sourmash sigs representing reads/contigs using sourmash compare

Documentation

Data set generation

Potential punt for 1.X release

charlesreid1 commented 6 years ago

A git clone of this repo currently takes upwards of 5 minutes on my machine. I think this is something we should solve before a 1.0 release. Would you be open to that? If so, I'll start a thread where we can discuss options.

brooksph commented 6 years ago

Yes, we should address that. Please start an issue for that and feel free to add to the list.

charlesreid1 commented 6 years ago

See rules/ dir of dahak-taco.

charlesreid1 commented 6 years ago

Also @brooksph here is the taco "project" that I mentioned: https://github.com/charlesreid1/dahak-taco/projects

charlesreid1 commented 6 years ago

Just to follow up here: