Closed lmaxwell closed 5 years ago
sorry, I dit not notice that z-encoder is not used in Voilon modelling.
If singing voice is trainded together with lots of instruemnts(eg. Nsynth dataset), interpolating of z-vector would result in cross synthesis of instrument and singing voice. I'm interested in what it sounds like. Do you think it is feasible?
My guess is that the two domains might be too distinct to obtain real "cross-synthesis" if we use the model as is (and also the fact that it is a simple deterministic AE for now). Maybe enforcing domain confusion in some of the latent could help to avoid having a huge gap between the two distributions
Thank for your comment. I will spend some time do some experiments. close the issue now.
We have not tried it, but the current structure of the "synthesizer" is not fit for singing voice (formants are not well modeled by sinusoids). I guess the Neural Source Filter as an output module could be a better choice to perform this ;)