-
你好,我最近试用了一下modelscope上的CAM++说话人日志-对话场景角色区分-通用模型,感觉很不错!我看见这个模型其实是由几个模型组成的,其中的CAM++说话人确认-中文-通用-200k-Spkrs模型应该是用来提取说话人embedding的,那么我在想理论上是不是能够做到提前用说话人确认模型将一些说话人声音保存为embeddings,然后调用说话人日志模型进行主持人分组任务时,不仅输出原…
-
Hi, it is a really interesting work, but I have a question about the modelling of the prosody.
In the "2.4 Speech Decoder" section, I note that there is an operation "consecutive identical indices a…
-
How to change the speech recognition interface? I want to use iFlytek's or ourself's speech recognition.
I want to have a conversation in Chinese.
-
**Project description**
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition…
-
Hi,
First of all, thanks for this wonderful piece of work. I am really impressed with it and helped in my till date journey.
I am trying to learn[ azure open ai](https://learn.microsoft.com/en-u…
-
- [x] I have read and agree to the [contributing guidelines](https://github.com/griptape-ai/griptape#contributing).
it would be switfty nifty to have this open source TTS as a text-to-speech driver…
-
**Parent ticket:** [Feature: Audio File Handling](https://github.com/marawanxmamdouh/ConvoNerd/issues/17)
### Description:
Implement a speech-to-text module to transcribe audio content into text.
…
-
-
@Niketkumardheeryan I would like to add a lip read model(LipNet) using CV
This project aims to build a lip reading model that takes a video of speech without audio and produces the output as text t…
-
Hi @streamer45, thanks for your awesome package! I found discrepancies between silero-vad-go and the Python package. My input file is a 13-minute-long speech of JFK, and silero-vad-go misses multiple …
wjkoh updated
3 weeks ago