Strange affects in re-synthesized audio

Hi,

I was working on re-synthesizing my audio by first getting sequence of character units from HuBERT (hubert_base_ls960) model then passing it to Tacotron 2 model to generate mel-spec which is then used by Waveglow model to finally get the re-synthesized audio back. But when I compare 2 audios there are some strange affects added to my re-synthesized audio. Can you advice me what's wrong going on here.

I have attached 2 audio files for your reference.

Thanks

Audio Files

NVIDIA / tacotron2

Strange affects in re-synthesized audio #604