cambiotraining / bacterial-genomics

Course materials for "Working with Bacterial Genomes"
https://cambiotraining.github.io/bacterial-genomics/
Other
0 stars 2 forks source link

Pipeline & data tests #7

Open tavareshugo opened 7 months ago

tavareshugo commented 7 months ago

I have ran through the different steps of the pipelines for the 3 datasets, but only on a subset of 5 samples. Here is how long the steps took on our training instance:

From these timings, it's clear that running the larger workflows on the full dataset is not doable for a workshop setting, as it would take too long. My proposal is that they run the workflows on a subset of 5 samples to see how it looks like, but then they can analyse the outputs from the preprocessed directory.

For the dowstream steps like phylogeny, it's probably fine to run it on the full datasets, using the preprocessed data as input (might need to tweak the shell scripts in that case, to use the preprocessed directory as input).

tavareshugo commented 7 months ago

We will limit most of the steps to 5 samples each.

We use the preprocessed data for looking at some multiQC reports, microreact, pathogenwatch, etc.