facebookresearch / encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
MIT License
3.51k stars 304 forks source link

Audio Codes "at utterance level" #83

Open juanilarregui opened 6 months ago

juanilarregui commented 6 months ago

❓ Questions

I'm interested in using the encoder to encode an audio fragment of a few seconds into just one codebook vector. However, the model returns a sequence of several audio_codes (of course, it is the only way to succesfully decode the audio afterwards).

How would you recommend using the encoder, and/or pre-postprocessing the audio input or audio_codes to obtain just one audio code "at utterance level"?

Thanks in advance.