Closed Kerry0123 closed 4 years ago
I am asking for your help. thank you.
Hi @Kerry0123,
Did you retrain the model with your preprocessing steps or did you feed your spectrograms directly to the pretrained model?
I retrain the model with my preprocessing steps. The loss of epoch 1 is 0.66. Loss will drop to 0. I am asking for your help. thank you.
@Kerry0123, something weird is going on because that loss is very low. What dataset are you using? The ZeroSpeech one? Also, could you share an example spectrogram so I can check if anything is odd?
The dataset is BZNSYP(Chinese dataset),To align the output of the synthesizer with the input of the vocoder,I use the preprocessing of the tacotron2 synthesizer. Its github link: https://github.com/cnlinxi/style-token_tacotron2. python preprocess.py --dataset=biaobei --base_dir=/tmp-data/data/ --output=/nfs/volume-340-1/tts_data_preprocess/training_data_biaobe. Is it convenient to tell me your email address? I send you mel file. I am asking for your help. thank you.
hi,I have doubt about the preprocessing_mel function. I use the following preprocessing method. The generated audio file is muted.
def melspectrogram(wav, hparams): D = _stft(preemphasis(wav, hparams.preemphasis, hparams.preemphasize), hparams) S = _amp_to_db(_linear_to_mel(np.abs(D), hparams), hparams) - hparams.ref_level_db
def _stft(y, hparams): if hparams.use_lws: False return _lws_processor(hparams).stft(y).T else: return librosa.stft(y=y, n_fft=hparams.n_fft, hop_length=get_hop_size(hparams), win_length=hparams.win_size) librosa.stft(y, n_fft=num_fft, hop_length=hop_length, win_length=win_length) def _linear_to_mel(spectogram, hparams): global _mel_basis if _mel_basis is None: _mel_basis = _build_mel_basis(hparams) return np.dot(_mel_basis, spectogram)
def _amp_to_db(x, hparams): min_level = np.exp(hparams.min_level_db / 20 np.log(10)) return 20 np.log10(np.maximum(min_level, x))
def _normalize(S, hparams): if hparams.allow_clipping_in_normalization: (True) if hparams.symmetric_mels: (True) return np.clip((2 hparams.max_abs_value) ((S - hparams.min_level_db) / (-hparams.min_level_db)) - hparams.max_abs_value, -hparams.max_abs_value, hparams.max_abs_value) else: return np.clip(hparams.max_abs_value * ((S - hparams.min_level_db) / (-hparams.min_level_db)), 0, hparams.max_abs_value)
The main difference is “S = _amp_to_db(_linear_to_mel(np.abs(D), hparams), hparams) - hparams.ref_level_db” and _normalize, hparams.ref_level_db =20, hparams.max_abs_value = 4; data is [-4, 4], your preprocessing data is[0, 1]; the data range has a great influence on the model? I don't understand,I am asking for your help. thank you.