Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
MIT License
7.42k stars 747 forks source link

Request: Support for Descript Audio Codec (High-Fidelity Audio Compression with Improved RVQGAN) #130

Open GUUser91 opened 8 months ago

GUUser91 commented 8 months ago

With Descript Audio Codec, you can compress 44.1 KHz audio into discrete codes at a low 8 kbps bitrate. This universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio. It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.) https://github.com/descriptinc/descript-audio-codec Demo: https://descript.notion.site/Descript-Audio-Codec-11389fce0ce2419891d6591a68f814d5

Plachtaa commented 8 months ago

I agree that descript audio codec is a better option, but you have to retrain the whole model