MaartenGr / PolyFuzz

Fuzzy string matching, grouping, and evaluation.
https://maartengr.github.io/PolyFuzz/
MIT License
725 stars 68 forks source link

Update _tfidf.py to allow for sentence-level TFIDF ngram-matching #44

Open DGaffney opened 1 year ago

DGaffney commented 1 year ago

I need a sentence-level ngram option since I'm checking on similarities between short texts. Maybe this option is useful for others!

MaartenGr commented 1 year ago

Apologies for the late reply! I have to look into this a bit further as this also could be resolved by simply keeping the whitespaces or it might even make sense to create a different back-end that is optimized for sentence-level matching.

DGaffney commented 1 year ago

sorry for my late reply - this is definitely not optimized yes - just wanted to start the conversation helpfully rather than just demanding you build it :)