-
# Instruments
We have compared 3 easy-to-use **off-the-shelf instruments for voice activity / audio activity detection**:
- Silero-vad from here - https://github.com/snakers4/silero-vad;
- A po…
-
Hi, I noticed there are two modes: Audio for 48kHz and Speech for 16kHz. Would the score be accurate if both the reference and degraded samples were at an 8kHz sample rate in speech mode?
I receive…
-
### 🚀 The feature
I'm wondering if there are any researchers out there that can search an audio stream like an mp3 and determine whether or not the track is purely spoken word versus a song or musi…
-
Hello,
Thanks for your interesting work. I do want to check if the pre-trained checkpoints are available
-
Hi,
first, thanks for this implementation of WaveNet!
I'm interested in performing feature extraction from raw audio files. this features will be used for different tasks such as voice activity de…
-
> From 2023/10/11 meeting https://g0v.hackmd.io/t9ypB87SQBuMjjW_PheZVg#Comm-AI-transcript
The current implementation for speech-to-text (based on Whisper API) suffers from hallucination problems. S…
-
**Name of the feature**
*In general, the feature you want added should be supported by HuggingFace's [transformers](https://github.com/huggingface/transformers) library:*
- *If requesting a **model…
-
Depending on how hackable the ncurses interface is would it be possible to have actual voice chat support?
-
Hi,
I am currently trying to implement the speech-recorder Voice Activity Detection in my electron App on my M1 Mac and I am facing the current issue :
`Error: dlopen(/myElectronPath/node_modules…
-
Mini-Omni提供了一个很棒的思路,可以将LLM结合TTS,与等待LLM流式返回后再传给TTS做合成相比,无疑在降低延时方面理论上有显著提升。
但对于输入的部分,跟调用ASR后得到文本,再将文本作为模型输入相比,将语音编码后直接输入到模型有什么效果上或者延时上的优势吗?
提出这样的问题主要是因为,我们在人机对话的过程中,如果要降低响应延时,怎么在vad方面做优化是一个很大的难点,如…