Esukhia / sympound-python

Python version of the SymSpell Compound algorithm
https://pypi.org/project/sympound/
MIT License
12 stars 4 forks source link
spellcheck

sympound-python

This library is an implementation of the SymSpellCompound algorithm in Python. It was initially forked from rcourivaud/symspellcompound although most of the code has been rewritten.

Installation

pip install sympound

Documentation

If you want a quick complete example, see example.py.

Creating the sympound object

The first step is to create an sympound object, the constructor takes two main arguments:

adding dictionaries

Then some dictionaries can be added through the load_dictionary(filename) function, typically taking a file path as argument. The format of the dictionary is typically either a list of words (one per line), or a list of word and frequency (separated by a space). See example-dict.txt for an example.

You can also add entries directly with create_dictionary_entry(key, count) where key is the valid string and count the frequency associated with it. This is the advised method to use if your data is not in a simple format like the previously described dictionary.

A lot of computations happen at this stage and adding a large number of entries can easily take more than one minute, so we provide two functions to save the analyzed ductionaries as a pickle: save_pickle(filename) and load_pickle(filename), both taking a file path as argument. Note that the pickled is gzipped.

Lookup

Once the dictionaries are loaded, you can get suggestions for a string by calling lookup_compound(str, edit_distance_max), where str is the string you want to analyze and edit_distance_max is the maximum distance you want suggestions for.

The function returns a sorted list of SuggestItems, containing three fields:

Maintainance

Upload on pip:

python setup.py sdist
twine upload dist/*

Copyright

The code is Copyright Esukhia, 2018, and is distributed under the MIT License.