anhaidgroup / py_stringmatching

A comprehensive and scalable set of string tokenizers and similarity measures in Python
https://sites.google.com/site/anhaidgroup/projects/py_stringmatching
BSD 3-Clause "New" or "Revised" License
135 stars 16 forks source link

Editex groupings not correct #77

Open alexanderamy opened 2 years ago

alexanderamy commented 2 years ago

Use of one-to-one mapping from chars to ints as implemented by letter_groups dict on lines 235-246 of editex.py does not allow for chars to be in multiple groups. For example, Z, which should be in groups 8 and 9 per page 4 of Phonetic String Matching: Lessons from Information Retrieval is only in group 9 in current implementation.