Fully worked out tutorial

fedarko / strainFlye

Pipeline for analyzing (rare) mutations in metagenome-assembled genomes

BSD 3-Clause "New" or "Revised" License

8 stars 1 forks source link

Fully worked out tutorial #10

Closed fedarko closed 1 year ago

fedarko commented 2 years ago

Either a markdown file or a jupyter notebook would probs be best. Maybe walking through the SheepGut dataset? (or a small subset of it)

Should showcase pretty much everything in the pipeline, starting with gfa --> fasta, then align, then calling, ...

fedarko commented 2 years ago

Times taken of each command on the full SheepGut dataset

These are informal benchmarks -- they just give an idea of the order of magnitude of time that each step takes. Will continue filling in this list as things get done.

align: 62,941.21 sec [~17.5 hours] (older version), 63,862.81 sec [~17.7 hours] (~May 26)
call @ p = 0.15: 69,792.43 sec [~19.4 hours]
call @ r = 5: 70,744.16 sec [~19.7 hours]

fedarko commented 2 years ago

Alternatively, add a step to the tutorial (after alignment, before calling) that filters the FASTA file to just long / high-coverage / high-checkm-quality contigs? would make this go faster, and be a more realistic representation of what probably gets done in practice