VIDA-NYU / bdi-kit

A Python toolkit for biomedical data integration
https://bdi-kit.readthedocs.io
Apache License 2.0
4 stars 1 forks source link

Develop an algorithm to decide what is the best value mapping algorithm for a specific case #60

Open roquelopez opened 2 months ago

roquelopez commented 2 months ago

For cases where we want to map ['pN0', 'pN1', 'pN2', 'pNX'] -> ['N0', 'N1', 'N2', 'NX'] the edit distance method is more promising, but for cases like ['Deceased', 'Living'] -> ['Dead', 'Alive'], the embedding approach is better.

Implement a method to select the most promising algorithm.

julianafreire commented 2 months ago

We should provide the ability for users to try different methods so that they can check and compare the results. For example, we can have a function that tries all matching strategies, or a set specified by the user: match_column_values(col1,col2,[edit_distance,embedding,jaccard_trigram]). Similar to what we have for the attribute matching, we could display the results and allow users to indicate the correct one, or to provide an alternative method (e.g., a python function)