Closed cainesap closed 7 years ago
Hi Andrew, there is certainly some code to be improved in evaluate.py to properly deal with that error. I'll see.
But there is another problem, your segmented.puddle.txt should have word separators as " ", not ";ewords", as it is in gold.txt.
Do you remember how you obtained this segmentation output?
My apologies! No need to fix anything. You are right, I was feeding the output of 'phonemizer' to 'wordseg-puddle' rather than the output of 'wordseg-prep' .. my mistake, I'm sorry. I will script this to avoid the same problem in future. With the right file, all is well again!
Hello,
The wordseg pipeline works fine for me with ARPAbet input (thanks again, great resource!)
However with IPA input (e.g. from phonemizer / espeak) I encounter a problem:
If I run
cat segmented.puddle.txt | wordseg-eval gold.txt > eval.puddle.txt
I see the following error:I wonder if it's to do with space separation in the output of wordseg-puddle? (I happen to be using puddle)
Line 1 of the phonemized file looks like this:
Which means gold.txt looks like this:
And prepared.txt like this:
However, segmented.puddle.txt has inconsistent spacing around ;eword delimiters:
Is this the cause of the eval problem? Andrew