Yablon / auorange

Audio LPC (linear prediction code) using mel spectorgram, compatible for LPCNet
Apache License 2.0
61 stars 13 forks source link

Will Multi-band be supported in the feature? #3

Open BridgetteSong opened 3 years ago

BridgetteSong commented 3 years ago

@Yablon Do you consider to support multi-band in the feature? If not, could you tell me how to extract LPC of each sub frequency bad? Thanks! https://arxiv.org/abs/2005.05551

Yablon commented 3 years ago

@BridgetteSong Yes, I have the plan to support multiband lpc prediction in a few months.

I think the key to multiband lpc prediction is to find the reasonable linear spectrogram for each band.

The first method: In the multiband process of audios, we use some convolutions to get subband audios, which are signals in time domain. We know that convolution in time domain means multiplication in frequency domain with corresponding FFT transformations. We can get the linear spectrogram for each band by multiplying it with FFT transformation of corresponding convolution parameters.

The second method: We know that the four audio bands are from four frequency from low to high. We can just split the linear spectrogram to four linear bands.

After getting four linear spectrogram for each band, we can extract lpc prediction using methods in my repository.

I have tried these two methods, and I will implement them in my repository.

Appreciate your advice and do you have any other methods to extract LPC for each sub frequency band ?

BridgetteSong commented 3 years ago

@Yablon I have weak process fundamentals. As for 2nd method,

  1. split the linear spectrogram to four linear bands, provided fmin=0, fmax=8000, each sub-band has a 2000 frequency.
  2. extract mel spectrogram for each sub-band wavs, and lower frequency has more convolutions?
  3. or extract mel spectrogram for full band wavs, and split mel spectrogram into four parts according to different number of convolutions.
  4. I tried: fmin = [0, 2000, 4000, 6000] fmax = [2000, 4000, 6000, 8000] mel_filters = [43, 18, 11, 8] wav_data = auorange.load_wav(wav_name, 22050) wav_mb = pqmf.analysis(wav_data) for i in range(len(fmin)):

    • audio_processor = auorange.LibrosaAudioFeature(sample_rate, n_fft, mel_filters[i], hop_length, win_length, audio_lpc, fmin=fmin[i], fmax=fmax[i])

    • mel_spec = audio_processor.mel_spectrogram(wav_mb[i])

BridgetteSong commented 3 years ago

PS, another similar work LPC_for_TTS will also consider to support multi-band in the feature, and he also said he need helps, maybe you can do together.

Yablon commented 3 years ago

@BridgetteSong I have not ever tried your methods but I think methods are all OK if the mathmatics works well.

Different qmf implementions do have differences. I will first give a more accurate qmf implemention and than using the qmf implemention to get lpc predictions.

I plan to do all these things using (self-defined) tensorflow operations, surely in tensorflow 2.

You can do more experiments and try more methods. If you or others have better results, I am happy to learn and improve.

Yablon commented 3 years ago

Please keep this issue open and I will update my progress here.

BridgetteSong commented 3 years ago

@Yablon I tried a new method, 1. scale lpc_order as 1/n_bands:

2. keep mel_spectrogram the same, and calculate mel for each subband_wav:

3. merge bands:

4. the predict audio is normal 捕获 predition

5. code is as following: code

Yablon commented 3 years ago

@BridgetteSong I think that is a good experiment. I have several suggestions:

  1. Plot and save the 4 bands seperately at the end. Actually every band is a wav that you can hear and inspect.
  2. I have not ever tried the method of extracting subbands spectrograms. In TTS, if you change the input spectrogram, that means you should not only retrain a vocoder, but also you would have to retrain the acoustic model, which takes us more uncertainty and more work load. You can take considerations of your method again to check if it is necessary to extract mel spectrogram from subbands instead of all bands.
BridgetteSong commented 3 years ago

Sorry for I don't have much time to do this recently. I have tried to keep the mel spectrogram to predict lpc.

  1. just keep the same mel spectrogram extraction, so the lpc from mel spectrogram will be equal to the full band
  2. split wav into n_bands
  3. calculate predict and error by using each band wav and the mel spectrogram

PS: Smaller lpc_order will lead to better result. I don't know why egs. lpc_order=2 will get best result

Following is code: code

Plot: 捕获

Yablon commented 3 years ago

@BridgetteSong I am preparing for my wedding those days. Sorry for that I don't commit any codes.

The use of full band lpc in multiband is not appropriate because the spectrum varies in multiband.

You can try this method to get spectrogram for multiband samples:

  1. Get the linear spectrogram. The length of linear spectrogram equals to fft size N. And due to the properties of fft, we only use the former N // 2 + 1 points. The Spectrogram is S, every spectrogram of bands has Nb = N // num_bands
  2. first band S[0:Nb // 2 + 1]
  3. second band S[Nb // 2: Nb // 2 * 2 + 1 ]
  4. third band S[Nb // 2 2 : Nb // 2 3 + 1]
  5. fourth band S[Nb // 2 3: Nb // 2 4 + 1]

And you can use the four bands to calculate lpcs for multibands, the same with what you do in full band.

BridgetteSong commented 3 years ago

Happy wedding! I tried your advice, what i can do is to only change _mel_tolpc function

some samples overflow

predict

Yablon commented 3 years ago

@BridgetteSong OK, I will commit some codes on this Saturday. After that you can try it again. You can also start a pull request to this repository.

Yablon commented 3 years ago

@BridgetteSong hi, I have commit a method of splitting linear spectrogram to multiband spectrograms. I hope it can help you in how to get multiband lpcs.

liujshi commented 3 years ago

Happy wedding! I tried your advice, what i can do is to only change _mel_tolpc function

  • def mel_to_lpc(self, mel, i, n_bands):

    • inv_linear = self.mel_to_linear(mel)
    • nb = inv_linear.shape[0]//n_bands
    • return self.lpc_extractor.linear_to_lpc(inv_linear[i(nb//2):((i+1)(nb//2)+1)], repeat=self.hop_length)

some samples overflow

predict

1、pqmf有时候会让预测值大于1,看下有没有更好的分频办法。 2、你也可以在predict的时候除以4,就不会有溢出了。