-
### Description
The goal is to develop a Tibetan text-to-speech (TTS) model that can convert Tibetan text into Tibetan speech. This project involves training a TTS model using filtered good audio qual…
-
I wonder if DiffNorm is designed to normalize target speech units. Why is the src-feat required during the training of VAE and Diffusion in the provided script? I read the paper and didn't see any me…
-
Hi,
I'm currently trying to replicate the performance of Qwen2-Audio on the AIR Bench. However, I noticed that the repository at [AIR-Bench](https://github.com/OFA-Sys/AIR-Bench/blob/main/score_cha…
-
### Feature Description
Love to see how AI SDK can handle Text to Speech from OpenAI. As I see from documentation, TTS can be streamed.
https://platform.openai.com/docs/guides/text-to-speech/strea…
-
### Describe the bug
1. The TTS Speech service seems to limit the audio files to a maximum length of 10 mins. This is regardless of a free or paid account - https://learn.microsoft.com/en-us/azure/ai…
-
像是缺失了文件
Unrecognized model in D:\LIUGEGE\ComfyUI\models\Joy_caption_alpha\text_model. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, a…
-
Only can use speech recognition after generation?
-
### System Info
- `transformers` version: 4.45.1
- Platform: Linux-5.10.225-213.878.amzn2.x86_64-x86_64-with-glibc2.31
- Python version: 3.11.9
- Huggingface_hub version: 0.25.1
- Safetensors ver…
-
### Initial Checks
- [X] I have searched GitHub for a duplicate issue and I'm sure this is something new
- [X] I have read and followed [the docs & demos](https://github.com/modelscope/modelscope-age…
-
## Computer Vision:
- [x] Add Depth Estimation pipeline
- [ ] Add Image Classification pipeline
- [ ] Add Image Segmentation pipeline
- [ ] Add Mask Generation pipeline
- [ ] Add Object Detecti…