Find most similar - Githubissues

angeloskath / php-nlp-tools

Natural Language Processing Tools in PHP

Do What The F*ck You Want To Public License

753 stars 153 forks source link

Find most similar #63

Closed it-is-hacker-time closed 6 years ago

it-is-hacker-time commented 6 years ago

What algoritm should I use to find the closest match from a string to a set of strings.

Example of known inputs:

I would like a cheese pizza
I would like a cheese pizza with onions
I would like a cheese pizza without onions

Input I wanna match up and find most similiar, in case there are any similar (in this example there are just spelling mistakes):

I would like a ceese pizza with out onnions.

angeloskath commented 6 years ago

There are several similarities implemented in the library. The main idea would be to compute features for the strings and then the similarity should be computed based on the features.

If you want to try some experimental feature in the develop branch there is a KDTree implemented which can return the k-nearest neighbors in a set of documents.