-
I had successfully trained TTS for Indonesia. Here I also attached the result of mine compared to original utterance. It's trained on 23400 audio (about 17 hours) using modified parameters as attached…
-
#### Description
```librosa.util.exceptions.ParameterError: Invalid shape for monophonic audio: ndim=2, shape=(20, 345)```
I am trying to compute PCEN images for the given audio files. In the proces…
-
Hi, training is quite slow and I'm guessing that the spectrograms are generated during training. Can I preprocess all the audios beforehand to speed up training?
ghost updated
5 years ago
-
-
#### Description
The transformation of an windowed Fourier spectrogram into a mel-frequency spectrogram returns a numPy array in double precision (type `numpy.float64`) even when this windowed Fourie…
-
1. In your experience, what are the things that you look out for when deciding for melspectrogram amplitude? Specifically I'm asking on why did you decide on not normalising the melspectrogram in your…
-
In https://github.com/keunwoochoi/torchaudio-contrib/pull/34#discussion_r271428800, we discussed what purpose we actually see for the functional interface.
My stance would be to follow `torch.nn.fu…
-
* STFT
* Melspectrogram
* Filterbanks - mel, cqt
* Pseudo-cqt? (I have code for it)
* mag-to-db, db-to-mag
* mag(real, imag representations), phase(real, imag representation)
* FFT-based fast …
-
I have a model trained on LJ for 200k steps, and then I fine-tune a smaller dataset on top. So far I've had success with a couple of datasets (male and female) for data that ranges 15-30 minutes in le…
-
* For STFT, unlike in Kapre where I used `Spectrogram`, I think `STFT` is a better name - it's more precisely correct.
* I'm fine with following most of the `librosa` argument names. Here's libros…