amjith / fuzzyfinder

Fuzzy Finder implemented in Python
BSD 3-Clause "New" or "Revised" License
370 stars 30 forks source link

not good for Chinese? #14

Closed SeekPoint closed 6 years ago

SeekPoint commented 6 years ago

suggestions = fuzzyfinder('可大讯飞', ['仅 就 第三季度 而言,虽然 科大讯飞 管理 费用 与 研发 费用 都 在 大幅 提升,但 两项 之 和 与 营收 的 比例 为 24% ,去年 同期 的 25% 还 要 低 一个 百分点 。 因此,将>三季度 扣非净利润 降低,归咎于 管理 费用 与 研发 费用 的 提升,显然 不太 恰当。']) print(list(suggestions)) [] ---expected [科大讯飞], only one Chinese character is different

amjith commented 6 years ago

This is brilliant!! I never tested this outside of english, but I'm glad you did.

Here's the problem, this library is not trying to find closest matches to the word you have typed by tolerating typos. This library is for trying to narrow a list of long strings by typing bits and pieces of a substring from the long string.

There is a blog post that explains how this works: https://blog.amjith.com/fuzzyfinder-in-10-lines-of-python

What you're looking for is fuzzywuzzy library which will find closest matches to what you have typed based on Levenshtein distance.

SeekPoint commented 6 years ago

ok, i'wll check, thanks