Open aspell-helper opened 14 years ago
Kevin Atkinson kevina\@sf updated the issue on 2011-06-27 23:38:59 UTC
Kevin Atkinson kevina\@sf updated the issue on 2011-07-03 22:03:54 UTC
Kevin Atkinson kevina\@sf commented on 2011-07-03 22:03:54 UTC
So the problem is that d'orgán and Dorgan are similar in that they both have the same "clean" value of "dorgan" but the soundslike is different at "T*R*K*" and "T*R*K*N" respectively, which violates some of my assumptions I made. Not sure how I am going to fix this.
Kevin Atkinson kevina\@sf commented on 2011-07-04 01:06:51 UTC
And fixing this will almost certainly require breaking the dictionary format, further complicating things.
Kevin Scannell cos\@sf commented on 2011-07-18 15:27:21 UTC
Ok, maybe we're honing in on the problem. Both of those words *should* have a soundslike of "T*R*K*N". But I can't find a problem in the gaeilge_phonet.dat file.
As a simpler example, consider "organ". Should have a soundalike of *R*K*N but it comes out as *R*K*
The rule that's causing the trouble appears to be:
R(BGM)- R*
I think this because, for example, the string "oragan" correctly gives *R*K*N.
Am I not allowed to use the - syntax together with characters in parens as above? That syntax seems to work correctly other places.
Kevin Atkinson kevina\@sf updated the issue on 2011-07-19 18:29:08 UTC
Kevin Atkinson kevina\@sf commented on 2011-07-19 18:29:08 UTC
There could also be a bug in the phonet code. I did not write the original code, and it has been a while since I looked at it. I will try to have a look sometime soon.
If you fell so inclined you are welcome to look for yourself.
Kevin Scannell cos\@sf created a bug report on 2010-07-26 16:02:39 UTC (Orig. from https://sourceforge.net/p/aspell/bugs/243)
Using Aspell 0.60.6 on Ubuntu, and a fresh install of the Irish dictionary aspell5-ga-4.4-0.tar.bz2.
Immediately after "sudo make install" of the dictionary, I run this, expecting no output:
$ /usr/bin/word-list-compress d < ga.cwl | iconv -f iso-8859-1 -t utf8 | aspell --lang=ga list d'orgán m'orgán n-arm t-arm
These four aren't recognized though. The other 326042 are ok!