-
After a certain segment, all subsequent recognized texts are incorrect:
```
from openai import OpenAI
client = OpenAI(api_key="cant-be-empty", base_url="http://192.168.31.100:8000/v1/")
…
-
In stage 1, only ASR and TTS is used.
ASR is Audio -> Text, so loss is only calculated for text tokens, not for audio tokens right?
TTS is Text -> Audio, but mini-omni outputs text and audio sim…
-
It could be helpful to control **speaking style** using prompt audio, and control **emotion** using instruction text. I attempted zero-shot inference by including the instruction text in the prompt us…
-
Dear Hugo et al,
I have a gradio app that reads some audio and produces text and analysis graphs of it. (It doesn't actually change the audio.)
It was looking like I could use PyHARP v0.1.0 to in…
-
## As a visually-impaired visitor to the site, I need to be able to access an audio-described version of videos put on pages through the Legacy Embedded WYSIWYG so that I can experience the content of…
-
I am using Windows 10 LTSC. When I execute the command `edge-playback --text "Hello, world!"`, the generated audio plays in the MPV window.
1. For MP3 files, the pop-up window seems unnecessary and…
-
I prefer to use a script and CLI to generate audio with ChatTTS rather than opening the webUI and want these features in my script:
![webui](https://github.com/user-attachments/assets/fe35822c-656a…
Atoli updated
2 weeks ago
-
```
path = r"D:\Project\Python_Project\FasterWhisper\large-v3"
model = WhisperModel(model_size_or_path=path, device="cuda", local_files_only=True)
segments, info = model.transcribe("audio.wav",…
-
Just got the following error, seems to be hitting the limit in coqui TTS.
```
Chapter 50: 20%|████████████▌ | 1/5 [10:51
-
### Discussed in https://github.com/langchain-ai/langchain/discussions/27404
Originally posted by **kodychik** October 16, 2024
### Checked
- [X] I searched existing ideas and did not find …