EdouardBERGE / phonetic

latine soundex specialised in french with better mimic and more discrimination
1 stars 2 forks source link

dbg_corpus might be missing accents #2

Closed gaspardpetit closed 9 months ago

gaspardpetit commented 9 months ago

Bonjour!

I notice that the dbg_corpus is populated with words without accents, for example:

INSERT INTO `dbg_corpus` VALUES (13, 'facon', 'FASON');

Consequently, facon gets converted to FAKON instead of FASON

Is this intentional. or were the accents lost in the process of creating the git repo?

Cheers,

Gaspard

EdouardBERGE commented 9 months ago

i do not remember if myspell dictionnary was without accent, it's a long time ago anyway that was intended to be used for any latin language so without accent as you noticed, it's not perfect, the goal was only to beat classic methods handling accents may be part of a brand new project or strong evolution of this one (i have no time for this)

gaspardpetit commented 9 months ago

Understood, thanks for the response, your work was useful and inspiring for me!