Open BrJohan opened 10 years ago
I think you might be better off with Soundex or something similar. Soundex assigns a value to a word, such that words that are pronounced the same as assigned the same value.
The fuzzy library ( https://pypi.python.org/pypi/Fuzzy ) might be a good place to start.
I would like to suggest a possibility to compare persons names by using the Levenshtein Distance algorithm. See http://en.wikipedia.org/wiki/Levenshtein_distance
My genealogical 'research' is primarily related to Sweden. Very often persons have their name spelled a little different in various sourcedocuments.
Example: Kristina - Cristina - Christina - Chrestina - Christine
Using this suggested algorithm and allowing some (fairly small) maximum distance would be most helpful when trying to find duplicate persons in my database.