-
Can we have a forced alignment feature using something like **[aeneas](https://github.com/readbeyond/aeneas)**
This tool seems to be working really good in automagically synchronizing audio to text a…
-
改成读取本地路径的代码后:
conversation = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{"role": "user", "content": [
{"type": "audio", "audio_url": audio_path…
-
### Description:
Develop a process that improves the quality of inference transcriptions for audio files using Claude AI by aligning them with a verified transferred text. The transferred text is know…
-
Rendering options would allow the user to control how the item values are transformed to output.
Concrete examples would be:
- urls. these could be used to render a hyperlink, or an image or some o…
-
**Description:**
We would like to request the integration of subtitles or VTT files with the Livepeer player to support closed captioning. This feature would enhance accessibility by providing audio-t…
-
If I just want to apply Text-guided Audio-to-Audio Style Transfer for long text , will it be feasible to seamless transition from one audio to another as the prompt changes ?
-
Hello, I installed FlowSep and run the file `lass_inference.py` like here:
```shell
python3 lass_inference.py --text 'text_of_the_audio' --audio 'path_to_the_audio'
```
but I had this error:
`…
-
Hi, thank you for your excellent work. As we know, in text-to-text models, we can perform Retrieval-Augmented Generation (RAG). For more clarification, I have my personal data in text format, but to m…
-
I'm trying to run Multimodal RAG for processing videos using OpenAI GPT4V and LanceDB vectorstore
https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/multi_modal/multi_modal_video…
-
We just added support for more file types when you attach/paste/drop them. We also have support for turning audio into text, see (src/lib/speech-recognition.ts). Let's add support for importing audi…