Open Dx-wmc opened 1 month ago
Hi, what kind of examples would you like to see?
In general, this workflow is not necessarily intended to be reproduced. It was used to process the data set on this paper: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001421
And we are currently expanding it to process further data sets within the All The Bacteria project: https://allthebacteria.readthedocs.io/en/latest/
The already processed data from the workflow can currently be found in our web repository, which makes it easy to browse and download the data: https://bakrep.computational.bio/
If you are generally interested in seeing how the workflow is run or what the data structure that is entered must look like, I can go into this in more detail.
Greetings, Linda
Thank you for your patient reply. I would like to see a brief example of a nextflow running script, including the input metadata and corresponding results. This would be very helpful for me to configure and use.
Okay sure, I'll try to give a short example.
The nexflow script used could be found here: nextflow/661k.nf
The command to process the required data for the project was as follows:
nextflow run .nextflow/661k.nf -c ./bakrep/nextflow/nextflow.config -profile cluster --samples /shared/new-run/metadata.tsv
--setupdir /mnt/scratch/ --data assemblies/ --results results/ -with-conda
An example how the metadata.tsv
looks like, can be found in the repository: metadata_ena_661K_filtered_head51.tsv
Via the setupdir
parameter you need to provide a path to the specific databases used by the different tools. Default paths are stored in the nextflow/config.nf
.
The input data for the workflow consisted of the assembly FASTA files available at the following link: http://ftp.ebi.ac.uk/pub/databases/ENA2018-bacteria-661k/Assemblies/
For each processed assembly file, the following result files will be generated:
Assembly-statistics: sample.assemblyscan.json
CheckM2 quality control: sample.checkm2.json
Bakta annotation: sample.bakta.json
, sample.bakta.ffn
, sample.bakta.faa
, sample.bakta.gbff.gz
, sample.bakta.gff3
Taxonomic classification: sample.gtdbtk.json
Multilocus sequence typing: sample.mlst.json
At the moment I work on a updated version of the worflow to process the latest data from the All the Bacteria project. If you are generally interested in the whole project you can take a look at the current updates and information here: https://allthebacteria.readthedocs.io/en/latest/faq.html
hi, can you provide some examples for demonstration? The current introduction is a bit confusing to me.