LucasSheng / avatar-net

Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration
https://lucassheng.github.io/avatar-net/
178 stars 38 forks source link

Can this model be applied to discrete time sequence? #13

Open youngsuenXMLY opened 5 years ago

youngsuenXMLY commented 5 years ago

For speech audio signal, voice conversion is more and more popular. I wonder if the zero-shot style transfer learning can be used to voice conversion. For example, from a source speaker's voice(sv) to a target speaker's voice(tv). Extract the style(like prosody, stress, accent and so on) of sv and the content(timbre and characters) of tv, and mixed the style and content. I really looking forward to your reply, thank you.