anonymous-pits / pits

PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
https://anonymous-pits.github.io/pits/
MIT License
275 stars 34 forks source link

How to use encodec for PosteriorEncoder+Decoder? #17

Open fkwlqm opened 1 year ago

fkwlqm commented 1 year ago

Hello, is it possible to use encodec and replace the posteriorencoder+decoder? (sorry for noob question) In that case, how to make the flow model predict discrete tokens? Thanks.

anonymous-pits commented 1 year ago

It is possible to replace our posterior (STFT+yingram) encoder and decoder to EnCodec, but it will loose pitch controllability. After this March, while VALL-E and SPEAR-TTS are succeeded to build high controllability with RVQ, we also considered replacing VQ to RVQ, but this work is out of our hand now.

fkwlqm commented 1 year ago

Thanks for your work and comment!