Open aksha19n opened 1 month ago
Hello!
To properly reply I'd need a little more informations, such as:
Best, Matteo
Hi Matteo,
I installed KEMET on a UNIX system through conda and ran the script add_taxonomy_from_gtdb-tk.py I ran my genomes through the classify microbes with GTDB-Tk-v2 3.2 workflow available on Kbase. The output files from that were used to run the gtdb to ncbi majority vote script which provided me with a .tsv file containing id no, GTDB classification and NCBI classification. I ensured that the sample/id names are same on the .tsv file and the genomes.instruction file prior to running the add taxonomy script.
Hope this helps
Thank you!
Thanks for the extra details!
I've only tested the script from input obtained with gtdb-tk command line (so a difference could arise from that aspect).
Same goes for the gtdb-to-ncbi script, which depends on a specific version of the GTDB database.. Right now the add_taxonomy_from_gtdb-tk.py
script used to work for the 2022 "GTDB R07-RS207" release, as well as 2022 NCBI taxonomy.
I'm not excluding that major changes in taxonomy could have actually happened (I remember some changes regarding Firmicutes to Bacillota maybe?). - This would require fixing the correspondance from NCBI to KEGG BRITE taxonomy.
Else my suspect would be regarding the file extensions of your genomes/MAGs files (whether it was .fasta
, .fa
, .fna
, as it is required from the script in object and specified through the -f
argument when running it.
Best regards, Matteo
Hi Matteo,
Thank you!
The file extensions and names match in the genomes.instruction file and the output file from GTDB. I downloaded the metadata files for r207 and ran the gtdb to ncbi script and used the output file from that to run the add_taxonomy and it worked. However, when i ran the kemet.py code i ran into an error
File "kemet.py", line 781, in taxonomy_filter
for line in v[i_start+1:]:
UnboundLocalError: local variable 'i_start' referenced before assignment
Could you kindly guide me with this error?
Hi Matteo, Thank you! The file extensions and names match in the genomes.instruction file and the output file from GTDB. I downloaded the metadata files for r207 and ran the gtdb to ncbi script and used the output file from that to run the add_taxonomy and it worked.
Nice to know! Could you specify what you did precisely? This could serve as a temporary fix until I modify a few things 🙃
Right now I've seen that KEGG BRITE was updated to reflect the changes in the NCBI taxonomy as expected, therfore it will take a couple checks to bring the add_taxonomy
script up-to-date.
However, when i ran the kemet.py code i ran into an error File "kemet.py", line 781, in taxonomy_filter for line in v[i_start+1:]: UnboundLocalError: local variable 'i_start' referenced before assignment
Could you kindly guide me with this error?
Do you have the KEGG BRITE file br08601.keg
in your working folder? This should be downloaded automatically when setting the working folder via the set_kemet_working-directory.py
script.
If not, the file should be there. Else, I'll need to check if that file is still formatted in the way it was in 2022.
Best regards, Matteo
I am trying to run this script but it keeps returning with this "The genomes.instruction file has been updated with 0 genome(s) taxonomy indications, using '.fasta' extension" Could you please tell me if there is anything that I can do to fix it ?