Difference between SSL and PPG-based methods?

bshall / soft-vc

Soft speech units for voice conversion

MIT License

398 stars 33 forks source link

Hi @Kristopher-Chen, thanks for the feedback!

There are some definite similarities between PPGs and the Soft Speech Units we proposed. The main difference is that soft units don't require text transcriptions to train. This can be useful for training VC systems in languages without large corpora of annotated speech. Additionally, things like laughter, breathing, etc. may be captured better by soft units than PPGs. Unfortunately, I haven't compared the approaches directly yet. I think it would be a useful benchmark but haven't had the chance to look into it.

bshall / soft-vc

Difference between SSL and PPG-based methods? #6