-
The source recordings are split into 10-second chunks, right? This makes it harder to identify phonemes at the edges of these 10-second chunks: not only is phonemic context lacking (at the 'left' for …
-
maybe options to output subtitle-subrip (.srt) format files
with option to split lines by word or line or phoneme(consonant-vowels)
maybe also IPA/phonetic output
and syntax coloring/formatting(f…
cmrdt updated
7 years ago
-
I’m sure you’ve long been aware of this issue and thought deeply on it!
Affixed forms of words in the exception lists [don't get transformed](https://www.tecendil.com/?q=ache%20aches%20aching%20ach…
-
One possible scenario for using Persephone (though not the only one) is to use a linguist's field transcriptions to train an acoustic model, then transcribe new (previously untranscribed) audio files.…
-
[Here I will omit my suspicions on Wiktionary data in general and on their transcriptions in particular.]
I have noticed that the diphtong **ie** is treated somewhat weird in the data, as yot (j) alm…
-
Hi. I was able to train an italian model almost perfectly with the exception of few words that are intrinsecally ambiguous without context. Since your model is similar to the bert transformer what do …
-
Hi Nickolay, I tried to find a description of some criteria as to how the dictionary is transcribed (from other sources). One feature that strikes me as particularly odd is the entries marked with [th…
-
https://github.com/persephone-tools/persephone/blob/cfe5096e929edcb45b0eb8133c873b9f6e8361f0/persephone/datasets/na.py#L531
But in the base class we have:
https://github.com/persephone-tools/per…
-
-
hi
can this job do the state_alignment to split each phone ?
thanks