-
> For the encoder, i have a question... if i well understand, the target is just to maximize similarity of 2 audios of the same speaker and then minimize distance between them
> So, we could imagin t…
ghost updated
3 years ago
-
I have trained End to End model with tacotron2 and waveglow. I used 25 hours of female voice for train both tacotron2 model and waveglow model. The sampling rate of training data is 22050 hz.
At th…
-
Hello everyone,
This is not an issue but rather a question.
As seen in the implementation, the feature extraction is done as follows:
`n_fft = int(self.sample_rate * self.window_size)`
`win_le…
-
Hi, I went through the source code of your project and I noticed there is a difference in what is returned by MelSpectrogram and STFT.
The spectrogram returns magnitude, ie sqrt(Re**2 + Im**2), but …
-
Hi,
I am trying to use the _MelSpectrogram_ module of torchaudio 0.4.0 with Pytorch 1.4.0 to calculate mel spectrograms for audio signals during the training. When I run the code on a _Tesla P100 SX…
-
The usage instructions are missing some information on the feature preprocessing step.
- It would be helpful to give more exact instructions on how to extract spectrograms and phoneme features. Doe…
-
git commit : af6f86252e6fe21fc02357fe681344f89d75a0ca
torch : 1.5.0
tensorflow : 2.3.0
GPU: 4 GTX1080 8GB
Question: when I use 4 GPU, I set the batch size 8, 16, 32, but error out of memory occ…
-
commit: f632f59f4000e547a5def0e4029dc3af30da4047
tensorflow: 2.3.1
pytorch: 1.6.0
GPU: 4 * GTX 1080
config.json
`{
"model": "Tacotron2",
"run_name": "ljspeech-ddc",
"run_descript…
-
Hello, could anyone give an example of MFCC to WAV with librosa?
I've tried several algorithms, but the rebuilding gets pretty bad.
An example of what I need: https://www.research.ibm.com/haifa/…
-
I observed a interesting behaviour after 138K iters where discriminator dominated the training and generator exploded in both train and validation losses. Do you have any idea why and how to prevent …