Ecogenomics / GTDBTk

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
https://ecogenomics.github.io/GTDBTk/
GNU General Public License v3.0
465 stars 82 forks source link

--write_single_copy_genes for older version of GTDB database #363

Closed shubavarshini closed 2 years ago

shubavarshini commented 2 years ago

Hello there,

I'm working on creating a database for microbiome data analysis. I started with GTDB release 89 in 2019 to create a in-house pangenome database and other optimization analysis. I used GTDB-Tk v1.0.2 to taxonomically classify MAGs (Metagenome Assembled Genomes) using GTDB r89. I want the marker genes predicted for the MAGs. But that feature is available only in GTDB-Tk >=v1.4.0. I re-did the classification using the latest version of GTDB-Tk (1.7.0). But v1.7.0 does not support GTDB r89. I can evidently see different results between GTDB-Tk v1.0.2 and v1.7.0. Coming to my issue:

  1. I cannot change the GTDB release. I have to use r89. And I require the marker gene sequences for the MAGs. Is there any way I can use GTDB r89 and run GTDB-Tk that has the "--write_single_copy_genes" option? Do you think GTDB-Tk v1.4.0 would work on GTDB r89?
  2. Is there a separate script to extract marker genes using GTDB r89? Even if you can guide me to the logic of it, I can write the script to extract them using the files generated by v1.0.2 GTDB-Tk classify_wf.

Any guidance would be very helpful. Thank you.

shubavarshini commented 2 years ago

Hey, I figured out a solution. I used GTDB-Tk v1.4.0 that still supports GTDB r89 and my results did not change and the option ""--write_single_copy_genes" got executed as well. If at all you read this issue and have any suggestions or opinions would be happy if you share them.

For now closing the issue.