Open fkwlqm opened 1 year ago
It is possible to replace our posterior (STFT+yingram) encoder and decoder to EnCodec, but it will loose pitch controllability. After this March, while VALL-E and SPEAR-TTS are succeeded to build high controllability with RVQ, we also considered replacing VQ to RVQ, but this work is out of our hand now.
Thanks for your work and comment!
Hello, is it possible to use encodec and replace the posteriorencoder+decoder? (sorry for noob question) In that case, how to make the flow model predict discrete tokens? Thanks.