Pipeline & data tests - Githubissues

I have ran through the different steps of the pipelines for the 3 datasets, but only on a subset of 5 samples. Here is how long the steps took on our training instance:

S_aureus
- bacqc - 6 min
- assembleBAC - 26 min
- panaroo - 5 min
- iqtree - <1 min
M_tuberculosis
- bacqc - 6 min
- bactmap - 10 min
- pseudogenome -
- mask_pseudogenome -
- iqtree -
S_pneumo
- bacqc - 6 min
- assembleBAC - 23 min
- bactmap - 6 min
- pseudogenome - <1 min
- gubbins - <1min
- iqtree - <1min
- funcscan - 3 min

From these timings, it's clear that running the larger workflows on the full dataset is not doable for a workshop setting, as it would take too long. My proposal is that they run the workflows on a subset of 5 samples to see how it looks like, but then they can analyse the outputs from the preprocessed directory.

For the dowstream steps like phylogeny, it's probably fine to run it on the full datasets, using the preprocessed data as input (might need to tweak the shell scripts in that case, to use the preprocessed directory as input).

cambiotraining / bacterial-genomics

Pipeline & data tests #7