epi2me-labs / wf-metagenomics

Metagenomic classification of long-read sequencing data
Other
45 stars 21 forks source link

Adding custom seqs to existing databases #89

Open annabel-dekker opened 3 months ago

annabel-dekker commented 3 months ago

Ask away!

I have figured out a way to add in-house seqs to existing databases through a helpful thread in the kraken2 github (create nodes.dmp and names.dmp files for custom database · Issue #436 · DerrickWood/kraken2 · GitHub). We are thinking to create a standard practice to do this - have users run your pipeline and adding their own sequences to existing databases. The functionality of your tool to automatically build and store existing databases has been helpful with this. Does the building of the databases always check for the latest updates? If yes, I was wondering whether it is possible to just run the database building part of your pipeline, so that a user can manually edit (or we automate this as well) the databases and then continue the metagenomics/16s pipeline with the custom files? I hope I'm making sense! Let me know whether you see potential in this idea..

nggvs commented 3 months ago

Hi @annabel-dekker , Thank you for the suggestion! Glad to hear you have found our tool useful! We don't check automatically the databases as the user could potentially have different results if they use slightly different databases, nevertheless we offer the option to provide the custom database if they want to use a more recent one. At this moment we just generate the kraken2 database for SILVA, but we may consider do this from custom files from the user, in that case it would be useful to also give the user the option to provide their sequences to modify existing databases.

Thanks for the suggestion! We take it into account!