Encoding Long Audio Clips

facebookresearch / encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

MIT License

3.5k stars 304 forks source link

Encoding Long Audio Clips #71

Open aviaefrat opened 1 year ago

aviaefrat commented 1 year ago

I need the EnCodec tokens of long audio clips (hours long). Inputing such files as-is results in cuda OOM. I've seen you "do not try to be smart about long files". Does chunking the long audio files naively (and concatenating the EnCodec tokens post-hoc) produce identical results as inputting an entire file to the model? If not, how should I chunk my audio files?

julien-blanchon commented 6 months ago

Hey @aviaefrat did you find an elegent solution to this ?

foreverhell commented 3 months ago

I have tries to split the long audio files naively and concatenate the EnCodec tokens, but the produce results are not consistent except the first clip. I do not know how to keep them same.