Ecogenomics / GTDBTk

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
https://ecogenomics.github.io/GTDBTk/
GNU General Public License v3.0
468 stars 82 forks source link

Total genome counter in log is off (warning message) #585

Open GabeAl opened 5 months ago

GabeAl commented 5 months ago

Funny little minor glitch. I think it's using the counter for "genomes passing animo acid filter" instead of the total. This leads to fun warnings like: [2024-04-25 14:25:09] WARNING: 2 of 1 genomes have a warning (see summary file).

It knows "2 genomes identified as archaeal" but because "1 archaeal user genomes have amino acids in <10.0% of columns in filtered MSA" (and also didn't get classified through ANI screening, ending up "Unclassified Archaea"), it likely uses that counter instead. The other one gets classified up to the class level, which is fine.



user_genome | classification
-- | --
genome_827| d__Archaea;p__Thermoproteota;c__DRAE01;o__;f__;g__;s__
genome_17632 | Unclassified Archaea

## Environment
- [ ] Installed via pip (include the output of `pip list`)
- [x] Using a conda environment (include the output of `conda list && conda list --revisions`)
- [ ] Using a Docker container (include the `IMAGE ID` of the container)

## Server information
- CPU: AMD Ryzen Threadripper PRO 5995WX 64-Cores
- RAM: MemTotal:       1056641580 kB
- OS: Clear Linux (Kernel 6.7.9)

## Debugging information
- [x] `gtdbtk.log` has been included (drag and drop the file to upload).
- [ ] Genomes have been included (if possible, and there are few).

Log:
[gtdbtk.log](https://github.com/Ecogenomics/GTDBTk/files/15129984/gtdbtk.log)

## Additional comments
Not a big deal and this probably doesn't affect any inner workings. 
pchaumeil commented 4 months ago

Hi, Thanks for your feedback, I will have a look at this. As you said , this will not affect the final GTDB-Tk results.