Closed joanise closed 4 years ago
I think I fixed this here: 0aeedd5be38aed1e2b39da60a8135d0f65ad7813
I ran the alignment and it looks like there are a bunch of combining characters that got through the g2p like \u0300
combining grave and \u0302
combining circumflex so there are probably some changes needed in the g2p
I ran the alignment and it looks like there are a bunch of combining characters that got through the g2p like
\u0300
combining grave and\u0302
combining circumflex so there are probably some changes needed in the g2p
I believe I also just fixed this on the dev.fra
branch of g2p here: https://github.com/roedoejet/g2p/commit/93d0781f22dfd0ed0fdc87cd97c089775fba6a6c
It's now doing the alignment but I'm getting ERROR - Alignment produced a different number of segments and tokens, please examine dictionary and input audio and text.
Just tested with dev.fra
branch on g2p
, and I get the same error. Thanks for fixing my French g2p and the RAS bug.
I found the problem. Word <w>s</w>
goes to nothing because of g2p
rule s,,,\s|$
. So the .dict file goes from token t0b0d0p10s0w42 to t0b0d0p10s0w44, skipping t0b0d0p10s0w43, which is empty.
Two things here: 1) I should fix my rule not to erase a stand-along "s", especially since that's a real word in French, e.g., in "s'efforcent".
2) Studio should gracefully handle a word that vanishes, either with an explicit error message flagging it, if we don't want to support it, or with a way to align despite it if we do want to support it.
Nice find! OK, if you push that change to g2p dev.fra I'll merge it with master. Will you turn point 2 into an issue?
Sure, but...
1) It might take me some more time to fix this. Go ahead and merge dev.fra now, I don't know when I'll succeed in fixing it. I can work on master when I'm ready to figure it out. I've pushed two other unrelated small fixes there too.
2) Sure, I'll turn that into an issue.
On branch Studio:
dev.g2p
g2p:master
OpenSamples:master
, all up to date as of nowoutputs: