-
### Feature request
It would be great if we could perform audio-classification with whisper. As an example, it could be used for language detection:
- https://huggingface.co/sanchit-gandhi/whisper…
-
Hey @Datseris!! I saw this music transciption project on Julialang website for JuliaMusic which is an exciting project.
I wanted to discuss what you are looking for and would love make a proof of con…
-
# The Illustrated Image Captioning using transformers - Ankur NLP Enthusiast
The Illustrated Image Captioning using transformers
[https://ankur3107.github.io/blogs/the-illustrated-image-captioning-u…
-
Hello author,
Firstly, thank you for giving this repo, it is really nice.
I have a question that:
1. I download CMU data with single person with 100 audios and make speaker embedding vector and sy…
-
Hello p0p4k,
I'm reaching out to you again with a question.
Thanks to your great help, I've successfully trained and inferred the Korean pflow model. During the inference process, I observed a f…
-
# 🌟 New model addition
## Model description
**What type of model is Fast Pitch 1.1?**
It is a Mel spectrogram generator (part of a speech to text model engine) that mainly comprises of two F…
-
When testing the model from https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593 I encountered an issue where inference fails due to a broadcast mismatch.
**Steps to reproduce:**
```
…
-
-
For the moment, I have only been experimenting with the raw signal data and its FFT transform. However, I am sure much can be gained by cleanup the input signal.
From the top of my head, things worth…
-
# NVIDIA NeMo (ByT5 G2P and G2P-Conformer):
> NVIDIA NeMo provides grapheme-to-phoneme models for various languages, including **German**.
> The ByT5 G2P model is based on a neural network and can…