bridgedb / BridgeDb

The BridgeDb Library source code
https://bridgedb.org/
Apache License 2.0
28 stars 21 forks source link

Database identifiers should be sorted by closest match #97

Open DeniseSl22 opened 5 years ago

DeniseSl22 commented 5 years ago

This issue has been raised in the PathVisio issue tracker by @egonw , but @mkutmon and me agree that it belongs here.

Currently, the class freeAttributeSearch is used in PV (which is a BridgeDb class) to search for free text (names of genes/proteins/compounds) in the locally loaded BridgeDb mapping files. There doesn't seem to be a good sorting of results (for example, looking for "TP53" first gives some names that are longer (but contain the phrase TP53), before the 'TP53' only string is given. This also happens for metabolites (see issue on PV). @ariutta suggested: " You could use Levenshtein distance."

This sorting should then happen in the results produced by the freeAttributeSearch (and will then automatically be displayed in that order by PV). Some example code on how to build your own custom comparator and one using the Levenshtein distance.