apcamargo / taxopy

A Python package for obtaining complete lineages and the lowest common ancestor (LCA) from a set of taxonomic identifiers
https://apcamargo.github.io/taxopy
GNU General Public License v3.0
40 stars 5 forks source link

support synonym or equivalent name when calling taxid_from_name #5

Open ShannonDaddy opened 2 years ago

ShannonDaddy commented 2 years ago

Hi, when I call function taxid_from_name to get taxid, I get some warnings.

my code: import taxopy

ncbi_taxdb_dir = "database/ncbi_taxonomy" taxdb = taxopy.TaxDb(nodes_dmp=f"{ncbi_taxdb_dir}/nodes.dmp", names_dmp=f"{ncbi_taxdb_dir}/names.dmp", merged_dmp=f"{ncbi_taxdb_dir}/merged.dmp", keep_files=True) taxid_list = taxopy.taxid_from_name('Lactobacillus fermentum', taxdb) print(taxid_list)

the console output: [] C:\Users\AppData\Local\Programs\Python\Python38\lib\site-packages\taxopy\utilities.py:54: Warning: The input name was not found in the taxonomy database. warnings.warn("The input name was not found in the taxonomy database.", Warning)

Then, I checked the names.dmp and found that 'Lactobacillus fermentum' is a synonym, the scientific name is 'Limosilactobacillus fermentum'. When I use the scientific name in the code, the output is fine.

Is it possible to support synonym or equivalent name when calling taxid_from_name just like another python package ete3 would do?

Thanks a lot!

apcamargo commented 2 years ago

Hi @ShannonDaddy. This is something that I've been considering since I implemented taxid_from_name, but I was worried about the significant increase in memory usage. As far as I know ETE3 uses a sqlite database, so memory is not really a problem for them.

That said, I can add a load_synonyms parameter (disabled by default) that would allow synonyms and equivalent names to be added to the database. Is this feature urgent for you?

ShannonDaddy commented 2 years ago

Hi @ShannonDaddy. This is something that I've been considering since I implemented taxid_from_name, but I was worried about the significant increase in memory usage. As far as I know ETE3 uses a sqlite database, so memory is not really a problem for them.

That said, I can add a load_synonyms parameter (disabled by default) that would allow synonyms and equivalent names to be added to the database. Is this feature urgent for you?

It's not urgent for me. Temporarily, I just create the Taxon object using taxid directly. You can take your time to add the new feature. Thanks for the quick response.

pooranis commented 2 years ago

I would like this feature as well! I have used ete3, but I prefer this library for the LCA functions and because it works in situations where memory is available, but persistent disk space is not.

apcamargo commented 2 years ago

Thanks for the feedback. This looks like a useful feature to lots of people (I also ended up needing it recently). I'll think about how to implement it without hugely increasing memory usage as soon as I get some free time.