As discussed in issue #859, we should avoid strict equality checks for classified strings and instead use a fuzzy approach like the Levenshtein distance. To prevent outliers from being forced into the nearest label, a threshold is set at 0.5. This is fairly permissive but should be suitable for most use cases.
Let's try it out
Type of change
[X] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update
Checklist:
[X] My code follows the style guidelines of this project
[X] I have performed a self-review of my own code
[X] I have commented my code, particularly in hard-to-understand areas
Description
As discussed in issue #859, we should avoid strict equality checks for classified strings and instead use a fuzzy approach like the Levenshtein distance. To prevent outliers from being forced into the nearest label, a threshold is set at
0.5
. This is fairly permissive but should be suitable for most use cases.Let's try it out
Type of change
Checklist: