Open roquelopez opened 2 months ago
We should provide the ability for users to try different methods so that they can check and compare the results. For example, we can have a function that tries all matching strategies, or a set specified by the user: match_column_values(col1,col2,[edit_distance,embedding,jaccard_trigram]). Similar to what we have for the attribute matching, we could display the results and allow users to indicate the correct one, or to provide an alternative method (e.g., a python function)
For cases where we want to map
['pN0', 'pN1', 'pN2', 'pNX'] -> ['N0', 'N1', 'N2', 'NX']
the edit distance method is more promising, but for cases like['Deceased', 'Living'] -> ['Dead', 'Alive']
, the embedding approach is better.Implement a method to select the most promising algorithm.