MingjieChen / EasyVC

A toolkit for any-to-any encoder-decoder voice conversion systems
https://mingjiechen.github.io/easyvc/index.html
Apache License 2.0
80 stars 8 forks source link

Any current results? #2

Open unilight opened 1 year ago

unilight commented 1 year ago

Hi Mingjie, very cool project and very nice work!! This is exactly what we need -- more comparisons and analyses. Just wondering if you have any insights you can share? Although I bet you would rather write them in a paper :)

MingjieChen commented 1 year ago

Hello Wen-Chin,

I can share some insights I found in experiments.

  1. Some HUBERT based linguistic encoders (i.e. hubert_soft, content_vec) still cause speaker information leakage, even though some disentanglement learning methods have been applied.
  2. DiffWave as a decoder generates good quality waveforms but it ignores given target speaker information in inference. It reconstructs source speech. I am still looking into this problem.

In terms of results, I am currently still debugging and running trainings with limited number of GPUs in our lab. So I still need some time (e.g. one or two months) to get some formal results that can be shared.

I am happy if you would like to give suggestions, pull requests or more collaborations.

unilight commented 1 year ago

Hmm, I am happy to collaborate, but I am not sure what the end goal of this project this thus not sure how I can help.