Hi there, I have found little difference between what the paper said and what I found in code.
In Section 2.2 Paragraph 2 of Tacotron2 paper,
"Prior to log compression, the filterbank output magnitudes are clipped to a minimum value of 0.01 in order to limit dynamic range in the logarithmic domain."
but in the corresponding part of audio_processing.py is like,
def dynamic_range_compression(x, C=1, clip_val=1e-5):
return torch.log(torch.clamp(x, min=clip_val) * C)
Is this not a big deal for performance or stability?
I really appreciated for reviewing my concern in advance.
Hi there, I have found little difference between what the paper said and what I found in code.
In Section 2.2 Paragraph 2 of Tacotron2 paper, "Prior to log compression, the filterbank output magnitudes are clipped to a minimum value of 0.01 in order to limit dynamic range in the logarithmic domain."
but in the corresponding part of audio_processing.py is like, def dynamic_range_compression(x, C=1, clip_val=1e-5): return torch.log(torch.clamp(x, min=clip_val) * C)
Is this not a big deal for performance or stability? I really appreciated for reviewing my concern in advance.