Add bakta - Githubissues

fmalmeida commented 3 years ago

Study the best way to implement Bakta in the pipeline.

It will be nice to provide the users with the option to choose the base annotation with Prokka or Bakta, depending on their needs.

Check if it will be possible to add it.

fmalmeida commented 2 years ago

Bakta outputs are extremely similar to Prokka, however, their annotation is more reliable. Therefore, the addition seems to be very straightforward:

Create a module for bakta so users can use either prokka or bakta
If using bakta, select the outputs that are similar to the ones produced by prokka and are used throughout the pipeline, thus, the rest of the pipeline would be exactly the same, using the GFF and TSV from bakta or prokka

One thing to think is:

Bakta depends on a heavy database, thus, it would not be adequate to put it into the docker image
Therefore, to add bakta to the pipeline, the pipeline itself must be reconfigured to have a module that create all the databases that are used throughout the pipeline
Then, make the pipeline receive a parameter setting path to this database, which would be easier to users to make them up to date
This would also make the docker images only possess the tools, and not the database files, making them smaller, and also making it possible to use the pipeline with different profiles such as: conda, docker or singularity

Recapitulating:

To add bakta it would be necessary to:

make the pipeline use tools from conda, docker or singularity with the databases being set in a custom user path
create a module to automatically download and format the databases for the pipeline
re-configure the pipeline to use the database files from this database directory provided by the user
add bakta

fmalmeida commented 2 years ago

Now that pipeline has been restructured, this issue can become a reality.

Since bakta database is huge, instead of downloading and formatting with the pipeline users will have to download themselves as each system or institute will have a way to handle such massive download.

Thus, if users want to annotate and trigger bakta, they will have to simply:

Download the database
Set path to bakta database with --bakta_db

When using this parameter, the pipeline should automatically trigger bakta instead of prokka.

fmalmeida commented 2 years ago

Finally, after very much time, workflow is now properly running from top to bottom when using bakta. For release, it is now required to:

[x] Update the docs to explain about bakta option. How to use it? What to expect?
[x] Update version on manifest
[x] Update automatic reports so they understand when user used prokka or bakta. Check if everything is well rendered.
[x] Automatic report, when using prokka must understand when pipeline run using additional hmm libraries for prokka, and which ones were used (from the ones possible when building databases).
[ ] To think. If using bakta, there is addional parsing of outputs that we can do to give users more information in outputs?