gongchenghhu / TacoLPCNet-demo

2 stars 2 forks source link

code #1

Open X-CCS opened 3 years ago

X-CCS commented 3 years ago

作者你好,谢谢你的分享!最近在学习lpcnet,请问下你有相应的代码可以跑起来自己训练吗?

gongchenghhu commented 3 years ago

https://github.com/alokprasad/LPCTron you can try that.

Kristopher-Chen commented 2 years ago

https://github.com/alokprasad/LPCTron you can try that.

hi, it seems Tacotron2 used Mel-spectrum as output features, but LPCnet still uses BFCC. How should I change the LPCnet features into Mel-spectrums? From your paper, it seems what I should do is to change the 20 dim features into 80 dim mels and remove the pitch embedding. Is that the case?

Two more questions.

  1. Usually mel-spectrum is calculated by float data, but in LPCnet, int16 is required. Does this mean I should extract mels in int16 data two? And for Tacotron2, I need to quantize the output mels.
  2. If I want try 24kHz, anything special to pay attention to in LPCnet?

Thanks a lot.

gongchenghhu commented 2 years ago

This demo is based on the paper[](url https://www.isca-speech.org/archive/pdfs/interspeech_2021/gong21_interspeech.pdf), so the detail could be found on that.

Kristopher-Chen commented 2 years ago

This demo is based on the paper[](url https://www.isca-speech.org/archive/pdfs/interspeech_2021/gong21_interspeech.pdf), so the detail could be found on that.

Thank you! I just read this paper, so I want to confirm the details, so I can realize this. The first question is about the LPCTron, and the latter two are the questions for your paper.

gongchenghhu commented 2 years ago

@Kristopher-Chen For the above two questions:

  1. The original LPCNet is also use float type. https://github.com/alokprasad/LPCTron/blob/2337b5a1a5b3386c58cb7b3bac92b009721c9495/LPCNet/src/dump_data.c#L57
  2. Sorry I haven't tried it on 24k .

    "it seems what I should do is to change the 20 dim features into 80 dim mels and remove the pitch embedding. Is that the case?"<--That is right.

Kristopher-Chen commented 2 years ago

@Kristopher-Chen For the above two questions:

  1. The original LPCNet is also use float type. https://github.com/alokprasad/LPCTron/blob/2337b5a1a5b3386c58cb7b3bac92b009721c9495/LPCNet/src/dump_data.c#L57
  2. Sorry I haven't tried it on 24k .

"it seems what I should do is to change the 20 dim features into 80 dim mels and remove the pitch embedding. Is that the case?"<--That is right.

@gongchenghhu one more question. How to extract LPC from mel spectrums? Any reference codes?

gongchenghhu commented 2 years ago

@Kristopher-Chen This code may be useful. https://github.com/Yablon/auorange

Kristopher-Chen commented 2 years ago

@Kristopher-Chen This code may be useful. https://github.com/Yablon/auorange

Thank you! But when testing, some clips happens like below, and this will make the training data terrible. How did you cope with this? image

Kristopher-Chen commented 2 years ago

@gongchenghhu one more question about the sampling part. As no pitch features are provided, did you just ignore the temperature process used in the original LPCnet by pitch correlations?

gongchenghhu commented 2 years ago

@Kristopher-Chen For the pitch question, I I removed the pitch related network in the original LPCNet,and the input to the LPCNet does not contain any pitch information.

Kristopher-Chen commented 2 years ago

@Kristopher-Chen For the pitch question, I I removed the pitch related network in the original LPCNet,and the input to the LPCNet does not contain any pitch information

@gongchenghhu sorry, I meant pitch in the sampling part, not the network.

image

Still, I haven't solved the diverge problem caused by extracting lpcs from mel-spectrogram.