bactopia / bactopia

A flexible pipeline for complete analysis of bacterial genomes
https://bactopia.github.io
MIT License
411 stars 69 forks source link

Pick dataset based on taxonomic classifications #319

Closed rpetit3 closed 7 months ago

rpetit3 commented 2 years ago

One thing I was still thinking about (probably more of a different issue):

Would it be possible to integrate GTDB into the standard workflow and then annotate based on the GTDB classification? In my case I have some assemblies, but they are not yet assigned to a genus/species/strain. So my current workflow is to classify them with GTDB, then parse the gtdb-summary for genus and species and put these into bakta to have an annotation based on the right classification.

Since annotation is already done within the standard workflow, I would annotate twice when using bactopia. GTDB's classification could possibly be used for species-specific datasets as well. 🤔

But I can't really imagine how much work it would be to implement it that way. 😄

Originally posted by @lfenske-93 in https://github.com/bactopia/bactopia/issues/313#issuecomment-1125982991

rpetit3 commented 2 years ago

Hi @lfenske-93

This is something I've been wanting to do for a while. Exactly as you describe, I would like there to be a step in the main Bactopia pipeline for taxonomic classification. Then using the taxonomic classification, either build new datasets or select existing datasets.

Basically an attempt to mimic --species without knowing the species.

I'm not sure when that will come about though, so I created a new issue!

Cheers, Robert

rpetit3 commented 7 months ago

In v3, the main bactopia pipeline no longer includes "species-specific datasets", so I think its ok to close this issue