0913ktg / SC_VALL-E

Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-E
MIT License
134 stars 17 forks source link

pretrained weight? #1

Open seastar105 opened 1 year ago

seastar105 commented 1 year ago

as mentioned in paper, will you provide pretrained weight of model?

also, reconstruction from encodec tokens using vocos may boost quality of audio result

0913ktg commented 1 year ago

Hi, seastar105.

We are planning to provide the final checkpoints once we have thoroughly reviewed and ensured there are no issues. Once we are confident that there are no problems, we will share the checkpoints for the pretrained model. Regarding the use of vocoders for reconstruction from encoder tokens, we will definitely look into it to potentially enhance the audio quality.

Thank you for your interest, and we appreciate your patience. If you have any further questions or suggestions, feel free to let us know!

yilinyang7 commented 1 year ago

Hi @0913ktg, do you have a sample page to listen to?

0913ktg commented 1 year ago

Hello @yilinyang7. I have added a link to the audio samples of SC VALL-E in the README.

Thank you for your interest, and we appreciate your patience. If you have any further questions or suggestions, feel free to let us know!