Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
MIT License
7.7k stars 765 forks source link

Request: Support for Descript Audio Codec (High-Fidelity Audio Compression with Improved RVQGAN) #130

Open GUUser91 opened 1 year ago

GUUser91 commented 1 year ago

With Descript Audio Codec, you can compress 44.1 KHz audio into discrete codes at a low 8 kbps bitrate. This universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio. It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) https://github.com/descriptinc/descript-audio-codec Demo: https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5

Plachtaa commented 1 year ago

I agree that descript audio codec is a better option, but you have to retrain the whole model