Closed ruohoruotsi closed 2 years ago
Closed by accident. Looking into this. It would be easy to make a method like xsampa_list
that does not use strict_trans
. But I think that fixing the tonal support for Yoruba would address this problem.
Thanks for responding so quickly! Do you mean fixing tonal support with strict_trans
for Yorùbá? Cos tonal support works great with regular transliterate
. In my example above (sorry if this is dead obvious) O O_L r u~
captures the low-tone on the second o
perfectly and as far as I can tell does a consistently good job on large sections of my lexicon.
I need to read about what strict_trans
is doing, to get more context on how to fix 🤔 👍 🙏
Sorry to have let this drop. I'll try to look into in in the next day or two. It shouldn't be too hard to fix, but there is a technical problem to be addressed with regards to the tones.
Thanks David, I made a work around by cloning the XSampa class within my g2p and heavily modifying the ipa2xs
function. Initially I thought I could just get away with a space delimiting ' '.join()
but in the end I had to handle these special_phones = ['_L', '_H', '~']
(low-tone, high-tone and nasalizations) and ensure they were not spaced, but attached to their "parent" phoneme ... so I have a little extra post-processing to ensure these are correctly placed ... it's a bit hacky, but works, as a take-1.
the main changes are within IO HAVOC
comments. Incidentally, in testing out my code, I found some dodgy text that needed to be fixed in my corpora (tones without base chars & other oddities)
In any case, I used the generated lexicon from this ☝️ ☝️ & epitran
to make a Yorùbá asr, using Kaldi, only GMM-HMM triphone thus far, since I have tiny data, but everything more or less worked ... WER is still high 80% but to be expected at this stage.
Hi @dmort27, thanks for your work on epitran. I'm using it for Yorùbá g2p to generate XSampa spellings. It's brilliant! I'm showing off my usage below with a word with underdots diacritics & tonal marks.
this works fine and great ... however I need my spelling like
O O_L r u~
.So I tried:
trans_delimiter
works but only on IPA and if I pass that space delimited IPA string toxs.ipa2xs
, the delimiter gets removed ..see 3. below.xsampa_list
which looks like what I want, but unfortunately as someone else has noted in this issue, this function usesstrict_trans
which throws out all the tonal information 😬return ''.join(xsampa)
, so no chance of a delimiting, unless I modify this function to support an optional delimiter to use to join the list with.Sooooooo before I get too excitable and start hacking up stuff, I wanted to ask you if there's something I've missed or misused in order to get "space delimited" XSampa phonetic spellings? Thank you in advance 🙏