-
Could you add a native speech to speech / audio-to-audio support with encoder (tokenizer) and decoder (back to audio waves)
I was able to implement a decoder only model, I first used audio codec to…
-
I've been trying to set up a speech model on an Xavier NX, and I've been able to get Tacotron2/Waveglow running, however the the size of the models uses quite a lot of memory. I've been looking to use…
-
**IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:**
- Speech SDK log taken from a run that exhibits the reported issue.
See [instructions on how to take logs](https://docs.microsoft.com/azu…
-
Hello together,
I am currently trying to use OpenVoice for German language generation. I have not been able to figure out how this zero shot speech synthesis shall work. Is there some kind of multila…
-
In addVisemeReceivedEventHandler, I receive event.animation. I want to use Viseme 3D Blend Shapes to drive my 3D Avatar.
Here is an example JSON:
{
"FrameIndex": 0,
"BlendShapes": [
…
-
Hello, could you please help me understand the motivation for inserting blank IDs between the input IPA-ids? The implementation code can be found in text_mel_datamodule.py line216:
def get_text(sel…
-
**Describe the bug**
A subset of the voice models appear to have difficulty processing the three special characters: `` and `&` even when using entity format (https://learn.microsoft.com/en-us/azur…
-
Thanks for your great work. Recently, I am using the hidden_state output from a large language model as the input of the matcha_tts encoder for training. I have fit a sample tens of thousands of time…
-
The gradio app displays that
"MetaVoice-1B is a 1.2B parameter base model for TTS (text-to-speech). It has been built with the following priorities:
**Support for long-form synthesis.
![i…
-
I’m not sure if anyone noticed, [but there is a swift-native implementation of Piper](https://github.com/IhorShevchuk/piper-ios-app) that allows it to run on iOS -- and to have Piper models be used as…