-
How can I choose the audio for each resolution, for example
1080: 192k audio
720: 128k audio
480: 96k audio
when I change hls_group_id it appears different in the player
![Captura de pantalla…
-
The note in [WCAG 1.2.3](https://www.w3.org/WAI/WCAG21/Understanding/audio-description-or-media-alternative-prerecorded.html)
on the differences between text transcript and audio description is highl…
-
I am looking at the https://github.com/NVIDIA/audio-flamingo/blob/main/inference/inference_examples.py files and I couldn't find any example that use interleaving multiple audios and texts. However, I…
-
New tts provider
```python
import requests
import json
import time
from pathlib import Path
from typing import Generator
from playsound import playsound
class FailedToGenerateResponseError…
-
Currently, the `Feature Extraction` task includes both models for audio and text feature extraction (it is officially placed under the NLP modality). I think it would be nice to have a new task for `A…
-
great job! I want to know how to get pseudo pairs when I chose one modality(for example, Image) as a starting point. I can use audio-image and image-text model to retrieve audio and text, but how ca…
-
- 版本:V2
- 分割方式:webui 不切 api不传入任何切割符 则也为不切
- 其余参数完全一致
- 情况:api产生的音频wav格式 比 webui中的音频噪音要大
- 测试:api.py加上了webui中的音频归一 效果也不行,webui中生成的音频效果是最好的,即使把webui中所有的推理代码都copy过来也不行
- 期望解答:api.py应该做什么才能达到webui中…
OriX0 updated
1 month ago
-
We need to add support for audio file inputs as documents in our pipeline system. This will allow users to process audio files (e.g., MP3) and automatically transcribe them using services like OpenAI'…
-
Hi, I have a design proposal for customisable and extendable message types. There are pros and cons to this design. At first glance it seems to me that it might not be a breaking change.
Origin: It…
-
### Description
The goal is to develop a Tibetan text-to-speech (TTS) model that can convert Tibetan text into Tibetan speech. This project involves training a TTS model using filtered good audio qual…