facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.17k stars 2.01k forks source link

IndexError: index 4 is out of range #401

Open ElizavetaSedova opened 5 months ago

ElizavetaSedova commented 5 months ago

I'd like to try your pre-trained stereo models. But when I generate the sample I get this error. I'm using MusicGen's demo jupyter notebook to create audio.

lonzi commented 5 months ago

Please share your callstack, env, other details relevant to reproduce. It could be that you just need to re-install audiocraft, see: https://github.com/facebookresearch/audiocraft?tab=readme-ov-file#installation

AK-uni-git commented 5 months ago

Hello, I get the same error when generating music with stereo model and using multi band diffusion. Is multiband diffusion not supported with stereo models? MBD works on old mono models and stereo models work without MBD. I would just like to combine MBD with stereo model to get the best output quality. I'm using auodiocraft on the following front end. https://github.com/rsxdalv/tts-generation-webui

Python 3.10.9 Main dependency versions: audiocraft 1.3.0a1 torch 2.1.2+cu121 torchaudio 2.1.2+cu121 xformers 0.0.23.post1 I also tested with torch 2.0.0 with xformers 0.0.20 and got the same error.

Parameters used in the test run: text : 80s synth pop melody : None model : facebook/musicgen-stereo-large duration : 1 topk : 250 topp : 0 temperature : 1 cfg_coef : 3 seed : 3792762101 use_multi_band_diffusion : True

Callstack: Traceback (most recent call last): File "Audiocraft\torch_201Venv\lib\site-packages\gradio\queueing.py", line 407, in call_prediction output = await route_utils.call_process_api( File "Audiocraft\torch_201Venv\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api output = await app.get_blocks().process_api( File "Audiocraft\torch_201Venv\lib\site-packages\gradio\blocks.py", line 1550, in process_api result = await self.call_function( File "Audiocraft\torch_201Venv\lib\site-packages\gradio\blocks.py", line 1185, in call_function prediction = await anyio.to_thread.run_sync( File "Audiocraft\torch_201Venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "Audiocraft\torch_201Venv\lib\site-packages\anyio_backends_asyncio.py", line 2134, in run_sync_in_worker_thread return await future File "Audiocraft\torch_201Venv\lib\site-packages\anyio_backends_asyncio.py", line 851, in run result = context.run(func, args) File "Audiocraft\torch_201Venv\lib\site-packages\gradio\utils.py", line 661, in wrapper response = f(args, **kwargs) File "Audiocraft\tts-generation-webui\src\musicgen\musicgen_tab.py", line 209, in generate wav_diffusion = mbd.tokens_to_wav(tokens, 32) File "Audiocraft\torch_201Venv\lib\site-packages\audiocraft\models\multibanddiffusion.py", line 188, in tokens_to_wav wav_encodec = self.codec_model.decode(tokens) File "Audiocraft\torch_201Venv\lib\site-packages\audiocraft\models\encodec.py", line 251, in decode emb = self.decode_latent(codes) File "Audiocraft\torch_201Venv\lib\site-packages\audiocraft\models\encodec.py", line 259, in decode_latent return self.quantizer.decode(codes) File "Audiocraft\torch_201Venv\lib\site-packages\audiocraft\quantization\vq.py", line 102, in decode quantized = self.vq.decode(codes) File "Audiocraft\torch_201Venv\lib\site-packages\audiocraft\quantization\core_vq.py", line 402, in decode layer = self.layers[i] File "Audiocraft\torch_201Venv\lib\site-packages\torch\nn\modules\container.py", line 293, in getitem return self._modules[self._get_abs_string_index(idx)] File "Audiocraft\torch_201Venv\lib\site-packages\torch\nn\modules\container.py", line 283, in _get_abs_string_index raise IndexError(f'index {idx} is out of range') IndexError: index 4 is out of range

I am coming to the conclusion that multiband diffusion does not support stereo models since the dimensions in the tokens tensor are different between mono and stereo models.

ilkerb commented 5 months ago

I also have the same situation with "facebook/musicgen-stereo-melody-large" model. If I switch to "facebook/musicgen-melody-large" no error is raised. It would be much better to use the stereo model though.

rsxdalv commented 4 months ago

Hello, I get the same error when generating music with stereo model and using multi band diffusion. Is multiband diffusion not supported with stereo models? MBD works on old mono models and stereo models work without MBD. I would just like to combine MBD with stereo model to get the best output quality. I'm using auodiocraft on the following front end. https://github.com/rsxdalv/tts-generation-webui

Python 3.10.9 Main dependency versions: audiocraft 1.3.0a1 torch 2.1.2+cu121 torchaudio 2.1.2+cu121 xformers 0.0.23.post1 I also tested with torch 2.0.0 with xformers 0.0.20 and got the same error.

Parameters used in the test run: text : 80s synth pop melody : None model : facebook/musicgen-stereo-large duration : 1 topk : 250 topp : 0 temperature : 1 cfg_coef : 3 seed : 3792762101 use_multi_band_diffusion : True

Callstack: Traceback (most recent call last): File "Audiocraft\torch_201Venv\lib\site-packages\gradio\queueing.py", line 407, in call_prediction output = await route_utils.call_process_api( File "Audiocraft\torch_201Venv\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api output = await app.get_blocks().process_api( File "Audiocraft\torch_201Venv\lib\site-packages\gradio\blocks.py", line 1550, in process_api result = await self.call_function( File "Audiocraft\torch_201Venv\lib\site-packages\gradio\blocks.py", line 1185, in call_function prediction = await anyio.to_thread.run_sync( File "Audiocraft\torch_201Venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "Audiocraft\torch_201Venv\lib\site-packages\anyio_backends_asyncio.py", line 2134, in run_sync_in_worker_thread return await future File "Audiocraft\torch_201Venv\lib\site-packages\anyio_backends_asyncio.py", line 851, in run result = context.run(func, args) File "Audiocraft\torch_201Venv\lib\site-packages\gradio\utils.py", line 661, in wrapper response = f(args, kwargs) File "Audiocraft\tts-generation-webui\src\musicgen\musicgen_tab.py", line 209, in generate wav_diffusion = mbd.tokens_to_wav(tokens, 32) File "Audiocraft\torch_201Venv\lib\site-packages\audiocraft\models\multibanddiffusion.py", line 188, in tokens_to_wav wav_encodec = self.codec_model.decode(tokens) File "Audiocraft\torch_201Venv\lib\site-packages\audiocraft\models\encodec.py", line 251, in decode emb = self.decode_latent(codes) File "Audiocraft\torch_201Venv\lib\site-packages\audiocraft\models\encodec.py", line 259, in decode_latent return self.quantizer.decode(codes) File "Audiocraft\torch_201Venv\lib\site-packages\audiocraft\quantization\vq.py", line 102, in decode quantized = self.vq.decode(codes) File "Audiocraft\torch_201Venv\lib\site-packages\audiocraft\quantization\core_vq.py", line 402, in decode layer = self.layers[i] File "Audiocraft\torch_201Venv\lib\site-packages\torch\nn\modules\container.py", line 293, in getitem** return self._modules[self._get_abs_string_index(idx)] File "Audiocraft\torch_201Venv\lib\site-packages\torch\nn\modules\container.py", line 283, in _get_abs_string_index raise IndexError(f'index {idx} is out of range') IndexError: index 4 is out of range

I am coming to the conclusion that multiband diffusion does not support stereo models since the dimensions in the tokens tensor are different between mono and stereo models.

It's a bit more complicated it seems. There's code out there that does MBD on stereo but reading it suggests that it converts down to Mono. Although I can't do it at the moment, I'd like to know if we can't just MBD left and right channels separately, or if the results become unusable then.

rsxdalv commented 4 months ago

Ok, this is what enables MBD for stereo from the official code - (concatinating left and right then cutting them back together): https://github.com/facebookresearch/audiocraft/blame/69fea8b290ad1b4b40d28f92d1dfc0ab01dbab85/demos/musicgen_app.py#L144

@AK-uni-git I'm including this fix in my repo.

tatsuya-takahashi commented 4 months ago

any updates? I'm facing this issue too.

tatsuya-takahashi commented 4 months ago

I'm so sorry, once I updated audiocraft to 1.2.0, this issue was resolved. Thank you.