-
Mini-Omni提供了一个很棒的思路,可以将LLM结合TTS,与等待LLM流式返回后再传给TTS做合成相比,无疑在降低延时方面理论上有显著提升。
但对于输入的部分,跟调用ASR后得到文本,再将文本作为模型输入相比,将语音编码后直接输入到模型有什么效果上或者延时上的优势吗?
提出这样的问题主要是因为,我们在人机对话的过程中,如果要降低响应延时,怎么在vad方面做优化是一个很大的难点,如…
-
### What would you like to see?
Hello everyone,
First of all, thank you for this superb project.
Would it be possible to use LocalAI for Whisper? Currently the model is Xenova Whisper which uses th…
-
### What happened?
When trying to convert an audio recording into text, the process closes and stops completely, it is not in the task manager
All the details are on the video
### Steps to reprod…
-
I wonder what the pretrained text-to-audio generator used in FoleyCrafter is?
Thanks for answering!
-
What does work
- The code in root
What doesnt work
The code in VoiceServer.
See engine.cpp is doing some neat little things to call our python file directly in voices/
We dont need to …
-
If I just want to apply Text-guided Audio-to-Audio Style Transfer for long text , will it be feasible to seamless transition from one audio to another as the prompt changes ?
-
Verbose flag is set to `false`, yet too much is logged, like that:
```
[dev:server]
[dev:server] stderr--- whisper_init_from_file_with_params_no_state: loading model from './models/ggml-base.bin…
-
**Describe the bug**
On version 4.2.0, the text for audio devices is pixelated. In 4.2.0 beta, the text was fine.
**Screenshots**
![pixelation](https://github.com/vkohaupt/vokoscreenNG/assets/9…
-
**IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:**
- Speech SDK log taken from a run that exhibits the reported issue.
[azure_speeck_sdk.zip](https://github.com/user-attachments/files/1662…
-
Provide the user the ability to click an icon, talk, and have user's voice interpreted as text
- [x] Create small use case example
- [ ] Update IAM permissions to allow Transcribe access
- [ ] In…