Open AntonOresten opened 9 months ago
Thank you for the kind comment and valuable feedback! I indeed thought that the name2taxids function could be improved, and I think your ideas are very good. Currently, I am very busy and don't seem to have even a little time to devote to development. However, I definitely want to improve on this point. Of course, opinions on more detailed implementations or PRs are welcome!
Howdy!
I want to start by saying that I've found this package to be very convenient and useful! My only issue is the time complexity of the name2taxids function. It does a linear search through the db.names dictionary (of type Dict{Int, String}), cumulating all IDs that match the name, which will be slow for larger datasets. I found that you can essentially invert the db.names dictionary to get a name => taxids dictionary (of type Dict{String, Vector{Int}}), but it can take a couple of seconds to create. Although this far outweighs the minutes or even hours that might be spent on doing linear searches for every query one might have. I reckon something along the lines of a function for creating such a dictionary would be nice to have. It's rather trivial to do manually, but requires accessing stuff that are not user-facing.
This is what I've been doing:
Cheers!