NVIDIA / tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference
BSD 3-Clause "New" or "Revised" License
5.11k stars 1.39k forks source link

stft magnitude minumum value #105

Closed min8328 closed 5 years ago

min8328 commented 5 years ago

Hi there, I have found little difference between what the paper said and what I found in code.

In Section 2.2 Paragraph 2 of Tacotron2 paper, "Prior to log compression, the filterbank output magnitudes are clipped to a minimum value of 0.01 in order to limit dynamic range in the logarithmic domain."

but in the corresponding part of audio_processing.py is like, def dynamic_range_compression(x, C=1, clip_val=1e-5): return torch.log(torch.clamp(x, min=clip_val) * C)

Is this not a big deal for performance or stability? I really appreciated for reviewing my concern in advance.

rafaelvalle commented 5 years ago

Should not be a big deal. Clamping with a higher threshold can remove noise, for example, and make training easier.

min8328 commented 5 years ago

Thank you for your immediate responding. Your comment also helped me to wide my sight.