-
As stated in the README:
> A text fragment can have arbitrary granularity:
> a paragraph,
> a sentence,
> a portion of a sentence (i.e., a group of words),
> a word, or
> a phoneme (i.e., a sing…
-
I tryed to use distil-whisper-v3 in stable-ts and it can be used.
However, it's unable to be used when I try to use "distil-large-v2".
Other model can't be used too.(ex:kotoba-whisper,"kotoba-tech/k…
-
is it possible to use OG whisper word-level timestamps and skip forced alignment?
-
# Multilingual pronunciation similarity
The goal of this task is to measure whether or not the models in the benchmark encode pronunciation similarity.
This task is part of the meta-task #140.
…
-
Is there a way to get the confidence scores (word/sub-word level) also as the output?
with decode_beams, it is possible to get the time information for alignment purposes and KenLM state, in additio…
-
Agree it's an edge case. I think we can mitigate it after the release by improving the markup of our treebanked texts to use sentence level alignment rather than word level, eliminating the spans arou…
-
Hi guys!
I have run everything with Russian dataset to get alignments. Then I used these alignments to train a speech synthesis model (Fast Speech 2) on phoneme level. But then I can't figure out …
diff7 updated
3 years ago
-
## Story Explanation
### User Story
As an aligner, I want to see how translations have been aligned so that I can understand better how alignment is supposed to be done, and so that reference ot…
-
# Phone/Phoneme segment counting
This task is to count the number of phoneme segments in a given speech sample. This task is essential for evaluating the ability of models in the benchmark to accurat…
-
After playing with this repo and alignment etc... I made those changes that bring more accurate end results in terms of splitting the subtitles cues and with accurate timing.
```
import whisperx…