dahak-metagenomics / dahak

benchmarking and containerization of tools for analysis of complex non-clinical metagenomes.
https://dahak-metagenomics.github.io/dahak
BSD 3-Clause "New" or "Revised" License
21 stars 4 forks source link

Assembly workflow outputs questions: what to interpret and pipe into comparison workflow? #116

Open nalbright opened 6 years ago

nalbright commented 6 years ago

I successfully ran the assembly workflow by calling the read filtering workflow outputs in the custom_assembly_workflow.json! (runtime ~36hours) However, I have a few questions about the outputs of this workflow and the beginning of the next workflow (Comparison):

  1. In workflows/data there are three directories with results: k21, k33, and k55, each with assembly graphs, data, and "final_contigs.fasta". What are the K values referring too?
  2. Are the assemblies (final_contigs.fasta) in each of these three directories (k21, k33, and k55) results from megahit or spades? How is the data in these directories different than the "<megahit/megaspades>contigs.fasta" in /workflows/data? (what is more informative and used for the next workflow(s))
  3. Do the "k21", "k33", and "k55" directories with data relate to the k-mean values specified in the json for the Comparison workflow? If so, how? (the kvalues in the comparison workflow are different: k21, k31, k51)
  4. How do we go about select outputs from the Assembly workflow for the Comparison workflow? Or would we just want to pipe all the .fasta outputs (contigs.fa for trim2, trim 30, Megahit, and Spades) in the Comparison Workflow?

Thanks! Nicolette