-
Hey guys!
I get the following error when trying to convert my spectrograms to audio when using melGAN:
```
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 4.00 GiB total capaci…
-
# Task Name
Respiratory Sound Classification
## Task Objective
The objective of this task is to predict if an audio of respiratory sound indicates early-stage fatal lung diseases for better d…
-
Hello InternVideo team,
You guys have done a great job with this project!
In your paper, you use the Stage 2 model for the task of temporal grounding on QVHighlight [Lei et al., 2021] and Charad…
-
Hello!
As Speech to Text models such as Whisper are added having access to some of the impressive AI Text to Speech models would be a nice way to close the loop!
My current suggestion for a model …
-
### Model description
This model is is a Self-supervised Vision Transformer that uses patch reconstruction as the spectrogram task. It extends MAE (which is already on HuggingFace) for audio. This mo…
-
My predict.py:
```
from utils.config_manager import ConfigManager
from utils.audio import Audio
from scipy.io.wavfile import write
config_loader = ConfigManager('ljspeech_autoregressive_trans…
-
Hi there,
since I've some experiences in this field (audio delossify/upscale) I'd like to share what I have learned:
- During a lossy audio treating, the best approach is to carefully decode and pr…
-
The new [MusicLM](https://arxiv.org/abs/2301.11325) relies on an audio CLIP named [MuLaN](https://arxiv.org/abs/2208.12415)
I will build out an initial implementation [here](https://github.com/luci…
-
I encountered an issue when trying to export the facebook/m2m100_418M model using the optimum-cli tool. The error message indicates that the m2m-100-encoder is not supported, despite m2m-100 being lis…
-
### Feature request
I would like to propose the addition of a new learning rate scheduler that combines MultiStepLR with a warmup phase. Currently, the Transformers library does not include a sched…