facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.5k stars 2.06k forks source link

data dimension error when running the source MBD codes on the compression task #237

Open adagio715 opened 1 year ago

adagio715 commented 1 year ago

Thank you very much for open-sourcing the great work. I wanted to compare Encodec and MBD on the compression task by using the source codes on this page . Specifically, on colab ipynb, I installed audiocraft==1.0.0 by installing from github repo, and then I ran this code snippet:

import torch
from audiocraft.models import MultiBandDiffusion
from encodec import EncodecModel
from audiocraft.data.audio import audio_read, audio_write

bandwidth = 6.0  # 1.5, 3.0, 6.0
mbd = MultiBandDiffusion.get_mbd_24khz(bw=bandwidth)
encodec = EncodecModel.encodec_model_24khz()

somepath = 'PATH/TO/MY/WAVFILE'
wav, sr = audio_read(somepath)
with torch.no_grad():
    compressed_encodec = encodec(wav)
    compressed_diffusion = mbd.regenerate(wav, sample_rate=sr)

audio_write('sample_encodec', compressed_encodec.squeeze(0).cpu(), mbd.sample_rate, strategy="loudness", loudness_compressor=True)
audio_write('sample_diffusion', compressed_diffusion.squeeze(0).cpu(), mbd.sample_rate, strategy="loudness", loudness_compressor=True) 

However, when executing compressed_encodec = encodec(wav) and compressed_diffusion = mbd.regenerate(wav, sample_rate=sr), errors occured and both seemed to relate to the dimension of the parsed music data. Here is the error for compressed_encodec = encodec(wav): image

Here is the error for compressed_diffusion = mbd.regenerate(wav, sample_rate=sr): image

Can someone help with this issue? Thanks.

startreker-shzy commented 1 year ago

@adagio715 you can unsqueeze the wav like _wav = convert_audio(wav, sr, model.samplerate, model.channels) wav = wav.unsqueeze(0) follows the encodec repo https://github.com/facebookresearch/encodec

adagio715 commented 1 year ago

@startreker-shzy Thanks, it worked for the encodec model. But for the mbd model the following error occured when executing compressed_diffusion = mbd.regenerate(wav, sample_rate=sr): Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor. To solve this error, do device = torch.device("cuda" if torch.cuda.is_available() else "cpu"); wav = wav.to(device) before running compressed_diffusion = mbd.regenerate(wav, sample_rate=sr) can work.

startreker-shzy commented 1 year ago

@adagio715 Thanks for your sharing. And I can get the regenerated output. But the sounds has white noise. #236

adagio715 commented 1 year ago

@startreker-shzy Same here, and I tried the scaling method given under #236. It works. However, the mbd outputs have energy until ~6khz, while the signal is very weak above 6khz. Did you also experience this issue? #236