-
# would like to add mispronunciation detection to the SUPERB benchmark tests.
**Why I would like to do this**
I am doing my part II project at the University of Cambridge, and I would like to ext…
-
### Issue Summary
We use MathJax with a customized plugin in Reveal.js slides to produce lecture slides.
Currently we do not enable the math menu of Mathjax on page load as it increases load tim…
-
The paper doesn't seem to actually list any of the open-source datasets used.
```
Speech-only datasets We employ open-sourced
large-scale speech datasets, totaling 460K hours of
speech or 30B sp…
eadwu updated
4 weeks ago
-
Description
In light of the digital transformation of public services, this project endeavours to create a UI component tailored for Flutter applications. The main objective is to address the chall…
-
First of all, thanks for this amazing work in benchmarking the several available RNNT implementations. This is more of a "discussion" rather than an issue.
I am sure you are aware about this, but t…
-
Hi,
Congrats on your work!
I have run the scripts to download and extract the clips, but when I tried to inspect some clips I noticed that they don't depict the mouth, for example for muavic/es/…
-
**Describe the bug**
Audio-Webui does not install the requirements properly, precisely on audiolm, saying it failed to install.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to 'audio-…
-
The title of the paper https://arxiv.org/pdf/2410.15608 is
> Moonshine: Speech Recognition for Live Transcription and Voice Commands
However, the model is a non-streaming model, could you describe…
-
# Issue
Following the documentation for [Creating Voice Audio Files](https://cloud.google.com/text-to-speech/docs/create-audio#ssml) with Google Cloud Platform's _Text-to-Speech_ API the following…
-
#WIP
## Benchmark with [faster-whisper-large-v3-turbo-ct2](https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2)
For reference, here's the time and memory usage that are required to tr…