-
Hello. I tried using the demo code of Codi (https://github.com/microsoft/i-Code/tree/main/i-Code-V3) to reproduce results on the AudioCaps dataset. However, I was unable to achieve the results reporte…
-
Hi, I have similar problem to https://github.com/microsoft/CLAP/issues/24, but I'm using shorter audio than 6 seconds.
MWE:
```python
from msclap import CLAP
import torch
import subprocess
…
-
Hi Unsloth!
I came across this interesting model on reddit: https://www.reddit.com/r/LocalLLaMA/comments/1ez8rmu/llama31_just_got_ears_early_experiments/
It allows Text and Audio as input, and o…
-
I'm always frustrated when I want to quickly learn the meaning of a word or phrase but have to look it up manually and read the definition. It would be much more convenient if I could just input a str…
-
**Summary:**
Currently, the project relies on YouTube’s captioning system for lyrics extraction. However, only a limited number of YouTube videos have captions enabled, restricting the number of song…
cmm25 updated
11 hours ago
-
New tts provider
```python
import requests
import json
import time
from pathlib import Path
from typing import Generator
from playsound import playsound
class FailedToGenerateResponseError…
-
Currently, the `Feature Extraction` task includes both models for audio and text feature extraction (it is officially placed under the NLP modality). I think it would be nice to have a new task for `A…
-
great job! I want to know how to get pseudo pairs when I chose one modality(for example, Image) as a starting point. I can use audio-image and image-text model to retrieve audio and text, but how ca…
-
### Description
The goal is to develop a Tibetan text-to-speech (TTS) model that can convert Tibetan text into Tibetan speech. This project involves training a TTS model using filtered good audio qual…
-
How about adding Text-to-Speech alternatives to openai, such as: deepgram, fish.audio. Similarly adding other LLMs as well.