So, first I checked the audio was correctly int16 sampled with abs(audio).max() item's max value is 32768.0 min 792.0. After divided by 32768.0, the normalization process, audio's range is not exceed 1.
when I print nearly half of abs(audio).max() exceeds 1. After appending the max value to list, and check for their min max value... It turns out that min 0.484375 max 2.12109375
I think it's the cause of problem I faced with because when I changed audio = audio * 32768.0 / 2, The result wav pitch is normal but with little noise...(female voice more clearer, but male voice quiter) I wonder that 1.23 value can't be fixed. It is variable according to the dataset, I think.
I inferred with pretrained model but the result except lj dataset had low-pitch.
In mel2samp.py
So, first I checked the audio was correctly int16 sampled with abs(audio).max() item's max value is 32768.0 min 792.0. After divided by 32768.0, the normalization process, audio's range is not exceed 1.
but after infer process in inference.py -
when I print nearly half of abs(audio).max() exceeds 1. After appending the max value to list, and check for their min max value... It turns out that min 0.484375 max 2.12109375
I think it's the cause of problem I faced with because when I changed audio = audio * 32768.0 / 2, The result wav pitch is normal but with little noise...(female voice more clearer, but male voice quiter) I wonder that 1.23 value can't be fixed. It is variable according to the dataset, I think.
Can I change this in config.json or fix this?
Reference :
https://github.com/NVIDIA/waveglow/issues/5#issuecomment-437592753