-
Objective:
Improve the ability to align text and audio deltas for smoother playback and interruption handling.
Proposed solutions (in order of preference):
- Implement corresponding event_ids bet…
-
I'm trying to run Multimodal RAG for processing videos using OpenAI GPT4V and LanceDB vectorstore
https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/multi_modal/multi_modal_video…
-
Hi, thank you for your excellent work. As we know, in text-to-text models, we can perform Retrieval-Augmented Generation (RAG). For more clarification, I have my personal data in text format, but to m…
-
In stage 1, only ASR and TTS is used.
ASR is Audio -> Text, so loss is only calculated for text tokens, not for audio tokens right?
TTS is Text -> Audio, but mini-omni outputs text and audio sim…
-
### What would you like to see?
Hello everyone,
First of all, thank you for this superb project.
Would it be possible to use LocalAI for Whisper? Currently the model is Xenova Whisper which uses th…
czerr updated
1 month ago
-
Over the last week or so, text-to-speech stopped working on my device. I usually use Alloy for audio generation, and now that voice, along with most other English voices, display this upon attempting …
-
After a certain segment, all subsequent recognized texts are incorrect:
```
from openai import OpenAI
client = OpenAI(api_key="cant-be-empty", base_url="http://192.168.31.100:8000/v1/")
…
-
### Steps to reproduce
case 'playyy': {
if (args.length < 1) return reply("Insira o comando, e em seguida um nome para a pesquisa!");
const { Innertube } = require('youtubei.js');
co…
-
Hugginface has most models in some other formats.
For example, the auto-to-text/text-to-audio model facebook/seamless-m4t-v2-large is in .safetensors format: https://huggingface.co/facebook/seamles…
-
### PsychoPy Version
2024.2.1
### What OS are your PsychoPy running on?
Windows 10
### Bug Description
My python version: 3.10.11
I'm using a VENV virtual environment. I recently changed the v…