cesine / phonetisaurus

Automatically exported from code.google.com/p/phonetisaurus
0 stars 0 forks source link

I busted something. #12

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
After rewriting the 'decoder' in C++ a second time I apparently busted 
something very minor.  The NETtalk15k test results are now about 1% worse than 
they were a week ago.  Not sure what I did yet, and it's going to be a pain in 
the butt to figure it out.

Original issue reported on code.google.com by Josef.Ro...@gmail.com on 11 Apr 2011 at 1:55

GoogleCodeExporter commented 9 years ago
So, a little progress.  I found the last good revision: 93a8f1af688e .  Now to 
track down where I done goofed.

Original comment by Josef.Ro...@gmail.com on 11 Apr 2011 at 2:45

GoogleCodeExporter commented 9 years ago
The issue was a bug in the entryToFSA function.  The function only added the 
first instance of any cluster, thus words like 'airmail', which contain more 
than one instance of the same cluster ('a|i' in this case) only generated an 
alternative cluster arc for the first occurrence of the cluster.  I've fixed 
this and confirmed that it returned the test-set WACC to the previous baseline, 
an improvement of 1% absolute over the previous build.  

Original comment by Josef.Ro...@gmail.com on 11 Apr 2011 at 6:05