Yomguithereal / talisman

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
https://yomguithereal.github.io/talisman/
MIT License
704 stars 47 forks source link

FONEM rule C-27 #154

Open chrislit opened 6 years ago

chrislit commented 6 years ago

Hi @Yomguithereal, I finally got around to implementing FONEM in Abydos. I found the paper a bit lacking in specifics also, especially with respect to the rule ordering. And I don't have very high confidence that my implementation is entirely correct. But I do think I spotted one bug in your rule C-27 (unless I misunderstood the rule, since I don't read French). My understanding is that C-27 should change Z to S when preceded by a vowel or between two consonants, but your regexp makes the change when followed by a vowel or between two consonants. As a result, I got different results on the inputs OZOUADE and POUYEZ.

Yomguithereal commented 6 years ago

Hum... I need to find the paper back from my computer and check that. Thanks @chrislit.

Yomguithereal commented 6 years ago

You're right. It seems I messed up. Can you just tell me what you got with OZOUADE and POUYEZ?

Yomguithereal commented 6 years ago

This algorithm is very messy though :). I you ever need a better algorithm for French I started phonetic algorithms myself there. The French one is very good but quite complex and I need a straightforward rules' format so that it becomes easier to port to other languages. You can test it here. But note that only the French and Spanish ones are correct. The other languages are not finished.