Open aviaefrat opened 1 year ago
Hey @aviaefrat did you find an elegent solution to this ?
I have tries to split the long audio files naively and concatenate the EnCodec tokens, but the produce results are not consistent except the first clip. I do not know how to keep them same.
I need the EnCodec tokens of long audio clips (hours long). Inputing such files as-is results in cuda OOM. I've seen you "do not try to be smart about long files". Does chunking the long audio files naively (and concatenating the EnCodec tokens post-hoc) produce identical results as inputting an entire file to the model? If not, how should I chunk my audio files?