MasayaKawamura / MB-iSTFT-VITS

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Apache License 2.0
401 stars 64 forks source link

Style transfer? #17

Open RK-BAKU opened 1 year ago

RK-BAKU commented 1 year ago

@MasayaKawamura

Is it possible to do a style transfer using available models? If not, how feasible it would be to implement it into MB/MS models? Thanks in advance!

MasayaKawamura commented 1 year ago

Hi @RK-BAKU, I apologize for the delay in replaying... I think that one of the key points to do a style transfer is to apply latent features (something like multi-speaker) to this model. VITS' speaker-independent representation is described here (section D. in VITS paper). Maybe this issue is helpful for you.