FOI-Bioinformatics / flextaxd

FlexTaxD (Flexible Taxonomy Databases) - Create, add, merge different taxonomy sources (QIIME, GTDB, NCBI and more) and create metagenomic databases (kraken2, ganon and more )
GNU General Public License v3.0
64 stars 7 forks source link

Use of custom genomes retrieved from another databases apart from NCBI and GTDB in FlexTaxD #52

Closed magibc closed 2 years ago

magibc commented 2 years ago

Hello,

First of all I would like to congratulate for the release of FlexTaxD which will facilitate the taxonomic exhange.

I have a rapid question in order to know if your tool could help me or not in my workflow.

I performed a metagenomic study and I classified the taxonomy through Kraken2 + Bracken, Metaphlan3 and Kaiju.

For Metaphlan3, a Newick format tree is available from the Metaphlan developers.

Nevertheless in Kraken2 I classified the taxonomy based on UHGG v2.0, and in Kaiju, based on proGenomes database. Both database webs does not have available a tree.

Therefore and from your publication in Bioinformatics Journal, and also from https://doi.org/10.1371/journal.pcbi.1009947, it seems me that I could use in FlexTaxD a custom sets of genomes retrieved from UHGG v.2.0 and proGenomes to construct a Newick taxonomy format and after find some other software to construct a Newick Tree that I can use in a R phyloseq package to study Unifrac distances as beta-diversity analysis. Could it be possible or not?

Sorry if it's not a interesting question, I'm fledging in bioinformatics.

Thanks on advance for your comments,

Magi.

davve2 commented 2 years ago

Dear @magibc, I thought I answered this question, sorry for the delay!

FlexTaxD has an inbuilt newick print function. It requires inquirer and biopython to be installed in your environment in addition to flextaxd. But then build a flextaxd database from your names/nodes files that you have used to build the kraken2 database and run flextaxd -db --visualise_node root

I´m not sure I understood your question correct though. Do you want a tree that is only showing nodes that are classified? Then there is no automatic function at the moment.

magibc commented 2 years ago

Thanks @davve2 . Don't worry for the delay.

1 ) Therefore, using the files names.dmp and nodes.dmp that I've used for kraken2 and kaiju, can I plot a Newick tree format? And after this plot I can save this tree for downstream analysis?

2) In addition, in the wiki I'm not capable to find the command how to include as input the names and nodes files?

3) I'm only interested in a databases diferent from NCBI and GTFDB. I have used UHGG v2.0 and proGenomes database. I understand that also it's possible only to work with custom database not?

Thanks another time,

Magí

davve2 commented 2 years ago

The tree will be printed to stdout so you can pipe the tree and save it to a file.

Names and nodes files have the NCBI "format" and so will be used with

flextaxd -db <databasename> --taxonomy_file nodes.dmp --taxonomy_type NCBI (names.dmp will be automatically detected when the two are in the same directory and the correct input type is selected.)

Building the database from GTDB use QIIME as "format/type" and the tsv taxonomy file provided by GTDB.

To get names.dmp and nodes.dmp files for kraken2 you can export this using --dump and --dbprogram kraken2. To get the newick tree follow the description in the previous message.

Example flextaxd -db <databasename> --dump -o taxonomy --dbprogram kraken2

If anything goes wrong I suggest to use --verbose to get some more information about your database build. That would be useful for me to understand what could have gone wrong if you end up with problems along the way.

magibc commented 2 years ago

Hello @davve2 ,

Thank you for your rapid reply. I will try it. It seems that it help me as I need.

Ok, thank you to prevent me for using --verbose option. Thanks again for your help and hints.

For both "special" databases (UHGG and Progenomes) I have yet the nodes and names.dmp files that I suppose that follow NCBI format (I will discover it).

Then, I will try to construct a Newick table from those files through:

flextaxd -db <databasename> --taxonomy_file nodes.dmp --taxonomy_type NCBI

followed by:

flextaxd -db <path_to_db> --visualise_node root

I give you a feedback asap. I need that my admin installs FlexTaxD.

Thanks again,

Magí.

davve2 commented 2 years ago

That should work

use > newick.txt at the end of command 2 to not loose your tree in the terminal!

I recomend using (ana)conda (which creates a local repository within your home directory (should not require admin). Check out https://docs.conda.io/en/latest/miniconda.html and https://bioconda.github.io/

If your admin install miniconda you should be able to install any available bioinformatic related program (in conda) which places it locally in your home folder and should not require admin rights.

David