cambiotraining / awd-pathogen-bioinformatics

Materials for "Introduction to Cholera Genomics" course
https://cambiotraining.github.io/awd-pathogen-bioinformatics
Other
1 stars 1 forks source link

Test Nextflow pipelines on cholera course data #18

Open tavareshugo opened 5 months ago

tavareshugo commented 5 months ago

While testing:

bsalehe commented 5 months ago

I have tested both pipelines, and generally the pipelines do not appear to significantly deviate from the learning objectives of the materials. In addition to the MultiQC report which is generated by both pipelines and is not accommodated in the material, the following notable adjustments could be made on the pipelines or/and materials:

  1. bacQC-ONT

    • Probably this maybe is a rather MultiQC issue than the pipelines, results from Bracken and Kraken are not in the MultiQC report.
    • The need to add QC section in the material covering also output from results/bacQC-ONT/metadata which summarises and provides species composition for each sample. The folder also contains file that shows summary stats for reads contents of each sample including coverage.
    • Two of the critical questions for cholera genomic surveillance can be answered by looking at the bracken and kraken outputs folders in the results folder.
    • The exercise in section 7.4 need to be updated based on the kraken and bracken outputs.
  2. assembleBAC-ONT

    • The notable outputs that are needed and appear to be missing in this pipeline are:

      • the flye graph with file assembly_graph.gfa as it is in the section 9.2.1. This also affects the follow up exercise in the section 9.5
      • This one is more of updating material than pipeline adjustment; the checkm2 results are in the results/assembleBAC-ONT/metadata folder which should be reflected in the materials.
      • Pipeline uses both filtlong and rasusa tools which is not quite clear what is the rationale of doing that.
      • It would be nice to remove from the pipeline outputs the fastq files from porechop which appear to be unusable in the analysis.