-
### Project Name
VidSage
### Description
# VidSage: Video Insights using Graph RAG
https://www.youtube.com/watch?v=IUSCWtB9jWk
VidSage focuses on processing video data, storing it in Azur…
-
AnyGPT is quite a promising project released 2 months before GPT4o.
It is a versatile multimodal *LLaMA-based* model, which is able not only to take images as an input, but also non-transcribed spe…
-
Hello, thank you very much for the absolutely awesome fantastic wonderful great our beloved Plussub ! 🥇 💯
Please we have dream : we can download anime in Japanese from streaming sites which have 7…
-
### Project Name
Curio
### Description
## ✨Curio
Curio is a personalised learning platform which uses Retrieval-Augmented Generation (RAG) to generate interactive audio lessons that engage users i…
-
**Describe the bug**
Audios generated for `gu-IN` locale using voice `gu-IN-DhwaniNeural` contains about 3 sec silence at the end of audio file. The same generation, performed using `gu-IN-NiranjanNe…
-
i cannot find the pretrained models is this link http://tts.speech.cs.cmu.edu/document_grounded_generation/cmu_dog/cmu_dog.zip
-
Hello, thanks for your great work! I have encountered several problems during the reproduction process and would like to ask for advice:
1. I tried to generate actions using my own audio and used M…
-
# Text-to-Speech Synthesis
Text-to-Speech is a speech generation task that converts written language into its spoken form.
## Task Objective
Text-to-Speech Synthesis (TTS) is an essential ta…
-
Speech recognition is a standard generation task where the input is speech, output is text. For now, analysis could be done on the output side only.
* Evaluation metric: word error rate, character …
-
Hey Weilbyte,
I have a friend who has a speech impairment, who doesn't like using discords tts as its a mans voice, I was wonder if you would be willing support an api where I can request text in t…