Open milot-mirdita opened 6 years ago
Thanks for the PR! You mention that you have ported the code. Are you still relying on GI numbers? if not, what approach are you following for acc => taxid conversion?
We work only with Uniprot, and that is sufficiently well annotated with NCBI taxons.
MMseqs2 does the annotation to Uniprot accessions, which are then mapped to NCBI taxons, which the LCA tool can then read.
We also implement a 2bLCA like approach to get more reliable LCAs.
Ah, ok. Thanks
By the way, do you have any manuscript that we could cite?
No, not really. I never tried to publish this tool. But thanks for asking :-)
During evaluation of the tool i found that the DP matrix for the RMQ was not filled properly, resulting in a lot of nodes with LCA root. For example maus + human results in the root node.
Thank you for the great implementation in all other regards. I have ported the code to C++ and integrated it into our homology search, clustering and metagenomics suite MMseqs2.