Open begeekmyfriend opened 5 years ago
Here is another evaluation result, better corpus, sounds very close to WaveNet right? Dirty and violent algorithm is even close to NN vocoder... ad_48000.zip
Add a bandpass, less noises. biaobei_xizang_48000.zip
def save_wav(wav, path, hparams):
wav = wav / np.abs(wav).max() * 0.999
f1 = 0.5 * 32767 / max(0.01, np.max(np.abs(wav)))
f2 = np.sign(wav) * np.power(np.abs(wav), 0.7)
wav = f1 * f2
#proposed by @dsmiller
wav = signal.convolve(wav, signal.firwin(hparams.num_freq, [hparams.fmin, hparams.fmax], pass_zero=False, fs=hparams.sample_rate))
wavfile.write(path, hparams.sample_rate, wav.astype(np.int16))
any futher progress about the concentrate clips issue? @begeekmyfriend
@superhg2012 Here are batch synthesis implementation https://github.com/begeekmyfriend/Tacotron-2/commit/f3bdae8ef26d51fb28b28d5e7413180f144401c1
I am working on concentrating pre-recorded sound clips, the clips are high-quality. After using your code, the synthesized wave is noizy, quality is bad, my samplerate is 8k, how to adjust the parameters with your code? @begeekmyfriend
f2 = np.sign(wav) * np.power(np.abs(wav), 1.0)
My process is as below:
step 1 : concentrate two source high quality record clips
step 2 : adjust the synthesized sound with your method
synthesized sound still not clear, what should I do? @begeekmyfriend
def concatenate( wav1, wav2): total_len = len(wav1) + len(wav2) res_wav = np.zeros(total_len) res_wav[:len(wav1)] = wav1 res_wav[len(wav1):] = wav2 return res_wav
wav = wav / np.abs(wav).max() * 0.999
f1 = 0.5 * 32767 / max(0.01, np.max(np.abs(wav)))
f2 = np.sign(wav) * np.power(np.abs(wav), 1.0)
wav = f1 * f2
# proposed by @dsmiller
wav = signal.convolve(wav, signal.firwin(513, [60, 3999], pass_zero=False, fs=fs))
As for synthesis, usually we need to concatenate all the synthesized clips to form a long piece of wave file. However one of the problems is that it hard to keep the volumn from different clips in persistence.
I have got some rough workaroud that we can do some non linear scaling for each clip to make weak volumn more scaling and strong volumn less scaling. Here is a rough formulation. When k > 1 it so called superlinear and when k < 1 it so called sublinear. In this case I applied sublinear method for each synthesized wav clips.
In
audio.py
Let us see the effect. Now it sounds more steady for the evaluation. However, when it is applied such scaling, the frequency band out of the range of
fmin
andfmax
is also scaled so that there might be some noises. So we also need to filter the right freqency band. I am also working around with it. Any suggestion is welcome. The current evaluation is provided below. wangdantong_22050.zip