data dimension error when running the source MBD codes on the compression task

adagio715 commented 1 year ago

Thank you very much for open-sourcing the great work. I wanted to compare Encodec and MBD on the compression task by using the source codes on this page . Specifically, on colab ipynb, I installed audiocraft==1.0.0 by installing from github repo, and then I ran this code snippet:

import torch
from audiocraft.models import MultiBandDiffusion
from encodec import EncodecModel
from audiocraft.data.audio import audio_read, audio_write

bandwidth = 6.0  # 1.5, 3.0, 6.0
mbd = MultiBandDiffusion.get_mbd_24khz(bw=bandwidth)
encodec = EncodecModel.encodec_model_24khz()

somepath = 'PATH/TO/MY/WAVFILE'
wav, sr = audio_read(somepath)
with torch.no_grad():
    compressed_encodec = encodec(wav)
    compressed_diffusion = mbd.regenerate(wav, sample_rate=sr)

audio_write('sample_encodec', compressed_encodec.squeeze(0).cpu(), mbd.sample_rate, strategy="loudness", loudness_compressor=True)
audio_write('sample_diffusion', compressed_diffusion.squeeze(0).cpu(), mbd.sample_rate, strategy="loudness", loudness_compressor=True)

However, when executing compressed_encodec = encodec(wav) and compressed_diffusion = mbd.regenerate(wav, sample_rate=sr), errors occured and both seemed to relate to the dimension of the parsed music data. Here is the error for compressed_encodec = encodec(wav):

Here is the error for compressed_diffusion = mbd.regenerate(wav, sample_rate=sr):

Can someone help with this issue? Thanks.

startreker-shzy commented 1 year ago

@adagio715 you can unsqueeze the wav like _wav = convert_audio(wav, sr, model.samplerate, model.channels) wav = wav.unsqueeze(0) follows the encodec repo https://github.com/facebookresearch/encodec

adagio715 commented 1 year ago

@startreker-shzy Thanks, it worked for the encodec model. But for the mbd model the following error occured when executing compressed_diffusion = mbd.regenerate(wav, sample_rate=sr): Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor. To solve this error, do device = torch.device("cuda" if torch.cuda.is_available() else "cpu"); wav = wav.to(device) before running compressed_diffusion = mbd.regenerate(wav, sample_rate=sr) can work.

startreker-shzy commented 1 year ago

@adagio715 Thanks for your sharing. And I can get the regenerated output. But the sounds has white noise. #236

adagio715 commented 1 year ago

@startreker-shzy Same here, and I tried the scaling method given under #236. It works. However, the mbd outputs have energy until ~6khz, while the signal is very weak above 6khz. Did you also experience this issue? #236

facebookresearch / audiocraft

data dimension error when running the source MBD codes on the compression task #237