Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.27k stars 904 forks source link

hyperparam "min_level_db" default value issue #293

Closed min8328 closed 5 years ago

min8328 commented 5 years ago

Hi, I have found difference between default value of min_level_db and the comment about this in the paper.

In Section 2.2 Paragraph 2 of Tacotron2 paper, "Prior to log compression, the filterbank output magnitudes are clipped to a minimum value of 0.01 in order to limit dynamic range in the logarithmic domain."

but in the corresponding part of datasets/audio.py is like, def _amp_to_db(x, hparams): min_level = np.exp(hparams.min_level_db / 20 np.log(10)) return 20 np.log10(np.maximum(min_level, x))

and in hparam.py, the default value of min_level_db is shown as below,

Limits

min_level_db = -100,

It supposed to be -40 for keeping consistency with paper, right? Or is this issue not a big deal for performance or stability? I really appreciated for reviewing my concern in advance.

Rayhane-mamah commented 5 years ago

Hi thanks for reaching out.

I have tried both and found that current setup works best (around 0.11 minimal bound). The audio preprocessing params largely sepend on your data, default are optimized for LJspeech dataset, so please adapt as you like for your own dataset. If you want consistency with the paper, switch np.exp with 10**, np.log with np.log10 and min_level_db to -40 in min_level computation. That will make your lower bound 0.01.

If you do try other options, feel free to share your results :)

On Wed, 5 Dec 2018, 12:46 min8328 <notifications@github.com wrote:

Hi there, I have found little difference between what the paper said and what I found in code.

In Section 2.2 Paragraph 2 of Tacotron2 paper, "Prior to log compression, the filterbank output magnitudes are clipped to a minimum value of 0.01 in order to limit dynamic range in the logarithmic domain."

but in the corresponding part of datasets/audio.py is like, def _amp_to_db(x, hparams): min_level = np.exp(hparams.min_level_db / 20 np.log(10)) return 20 np.log10(np.maximum(min_level, x))

and in hparam.py, the default value of min_level_db is shown as below,

Limits

min_level_db = -100,

Is this not a big deal for performance or stability? I really appreciated for reviewing my concern in advance.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Rayhane-mamah/Tacotron-2/issues/293, or mute the thread https://github.com/notifications/unsubscribe-auth/AhFSwDOCyTS_EXqc_3LkAf85gyQNI4inks5u17IAgaJpZM4ZCkB4 .

min8328 commented 5 years ago

I see. I'll try what you said. Thank you. :)