MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
736 stars 67 forks source link

Which mode/method to use that is agnostic to word orders? #51

Open skwskwskwskw opened 1 year ago

skwskwskwskw commented 1 year ago

Hi,

Would like to understand which matching algo/model is agnostic to word orders? I realised for instance Levenshtein Distance might be affected by word orders.

Thanks

MaartenGr commented 1 year ago

You can use TF-IDF for that since it typically only considers n-grams on a token level. Due to its bag-of-words like approach, it does not take the order of n-grams into account and therefore also not the order of words.