NVIDIA / waveglow

A Flow-based Generative Network for Speech Synthesis
BSD 3-Clause "New" or "Revised" License
2.27k stars 530 forks source link

audio normalization value changed (inference process) #148

Closed Ella77 closed 4 years ago

Ella77 commented 5 years ago

I inferred with pretrained model but the result except lj dataset had low-pitch.

In mel2samp.py

def get_mel(self, audio):
        #audio_norm = audio / MAX_WAV_VALUE
        print(abs(audio).max().item())

So, first I checked the audio was correctly int16 sampled with abs(audio).max() item's max value is 32768.0 min 792.0. After divided by 32768.0, the normalization process, audio's range is not exceed 1.

but after infer process in inference.py -

audio = waveglow.infer(mel, sigma=sigma)
            if denoiser_strength > 0:
                audio = denoiser(audio, denoiser_strength)

            print(abs(audio).max().item())
            #audio = audio*abs(audio).max()
            #audio = audio * 32768.0

when I print nearly half of abs(audio).max() exceeds 1. After appending the max value to list, and check for their min max value... It turns out that min 0.484375 max 2.12109375

I think it's the cause of problem I faced with because when I changed audio = audio * 32768.0 / 2, The result wav pitch is normal but with little noise...(female voice more clearer, but male voice quiter) I wonder that 1.23 value can't be fixed. It is variable according to the dataset, I think.

Can I change this in config.json or fix this?

Reference :
https://github.com/NVIDIA/waveglow/issues/5#issuecomment-437592753

Ella77 commented 5 years ago

@rafaelvalle @azraelkuan

rafaelvalle commented 4 years ago

Closing due to inactivity.