-
In the paper entitled: "Improving Massively Multilingual ASR with Auxiliary CTC Objectives" you proposed a method where early encoder layers where conditioned to identify the language of speech, while…
-
-
Hey hey @KdaiP,
Thanks for open-sourcing your implementation. I'm VB, I work in the open source audio team at Hugging Face. I'd love to know more and see how we can potentially help you with your e…
-
### Model description
The TRILLsson models are described in the publication TRILLsson: Distilling Universal Paralingistic Speech Representations. From audio, they generate generally-useful paralingui…
-
hello i am attempting to create text to speech with bark on intel a770 but it takes around 60 seconds to generate audio is that normal ? is there a way to make it faster like few seconds ? https://git…
-
I'm trying to use the elevenlabs python library with stream(), and it works fine with eleven_monolingual_v1 but fails with eleven_monolingual_v2. However I can't find anything in the documentation th…
-
![image](https://github.com/user-attachments/assets/fda027e3-f1c9-4bc8-b7d3-af5fee31cb97)
Section 1, Speech-Analysis, Word Frequency Tracking, Taser, Java/Kotlin, Android app, Speech Pattern Analysis…
-
Hi,
I want to know is there anyway to change the input text to a text stream to fit the language model's output.
threading.Thread(
target=play_text_to_speech,
…
-
# Task Name
Japanese Pitch Accent Word Recognition
## Task Objective
This task aims to recognize words in Japanese audio that have different meanings based on pitch accent. Japanese pitch accent …
-
We have trained StyleTTS2 model for Hindi language. Initially we trained PL-bert for Hindi considering Espeak phonemizer and Indicbert tokenizer. Then we utilized that newly trained Hindi PLbert by re…