facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
21.07k stars 2.17k forks source link

Training Help - Error opening file ... : RuntimeError('cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous') #462

Open tomhaydn opened 6 months ago

tomhaydn commented 6 months ago

Hi, I'm trying to train a new model from scratch via musicgen on a new dataset. I'm finding that the docs are quite difficult to follow.

Please see my folder structure and approach:

The command to initiate training !cd "audiocraft" && dora run -d solver=musicgen/musicgen_base_test_1 dset=audio/test_1

I have configured my custom solver config/solver/musicgen/musicgen_base_test_1.yaml

# @package __global__

# This is the training loop solver
# for the base MusicGen model (text-to-music)
# on monophonic audio sampled at 32 kHz
defaults:
  - musicgen/default
  - /model: lm/musicgen_lm
  - override /dset: audio/default
  - _self_

autocast: true
autocast_dtype: float16

# EnCodec large trained on mono-channel music audio sampled at 32khz
# with a total stride of 640 leading to 50 frames/s.
# rvq.n_q=4, rvq.bins=2048, no quantization dropout
# (transformer_lm card and n_q must be compatible)
compression_model_checkpoint: //pretrained/facebook/encodec_32khz

channels: 1
sample_rate: 32000

deadlock:
  use: true  # deadlock detection

dataset:
  batch_size: 4 # 32 GPUs
  sample_on_weight: false  # Uniform sampling all the way
  sample_on_duration: false  # Uniform sampling all the way

generate:
  lm:
    use_sampling: true
    top_k: 250
    top_p: 0.0

optim:
  epochs: 5
  optimizer: dadam
  lr: 1
  ema:
    use: true
    updates: 10
    device: cuda

logging:
  log_tensorboard: true

schedule:
  lr_scheduler: cosine
  cosine:
    warmup: 2000
    lr_min_ratio: 0.0
    cycle_length: 1.0

I have my dset config dset/test_1.yaml

# @package __global__

datasource:
  max_sample_rate: 48000
  max_channels: 2

  train: egs/test_1/train
  valid: egs/test_1/test
  evaluate: egs/test_1/train
  generate: egs/test_1/test

and finally, I have my dataset and data:

dataset/test_1/train/data.jsonl dataset/test_1/test/data.jsonl

both of these look like this:

{"path": "dataset/test_1/115775.mp3", "duration": 181, "sample_rate": 48000, "amplitude": null, "weight": null, "info_path": null}
...
...

each audio file has a 'manifest' file in the form: {"key": "A#", "artist": "Alec K. Redfearn & the Eyesores", "sample_rate": 44100, "file_extension": "mp3", "description": "Folk", "keywords": ["Folk"], "duration": 182, "bpm": 103, "genre": "Folk", "title": "Ohio", "name": "Ohio", "instrument": "mix", "moods": ["Folk"]}

I can adjust this as needed, but I want to get training working before I mess with parameters.

Everything runs fine then hits an error:

Error opening file ... : RuntimeError('cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous')

Thanks in advance for any help with this particular issue and would appreciate any general tips for something else I might be doing wrong. I really want to get a working model that isn't restricted by the license

DEBIHOOD commented 6 months ago

I've only done tests with unconditional training, because i didn't needed conditioning on text, i just wanted to condition on artist's ID (like it was done in OpenAI's JukeBox), but i haven't figured it out, so i've gone with unconditional training.

One thing that caught my eye immediately is that you've specified training in 48kHz in your dset config file

datasource:
  max_sample_rate: 48000
  max_channels: 2

The problem is that, as far as i know, FAIR didn't published Encodec weights for 48kHz, or if they did, you'll need to tweak more thing to make it work. So you might want to just set it to 32000. Your dataset is mostly 44100 anyways. +the artifacts of quntization -> dequantization that Encodec introduces are far more impactful than loosing some of these upper end frequencies, due to hearing decay related to aging, bad headphones/speakers/DAC and so on... I've covered this in the other issue here about Encodec being 32kHz instead of 44.1kHz, so you can find more info there.

You also gonna have easier time having things to start training if you're will be training in mono, because it also requires additional tinkering to set up training in stereo, so just set max_channels: 1.


and finally, I have my dataset and data:


dataset/test_1/train/data.jsonl
dataset/test_1/test/data.jsonl

Huh, interesting, because i don't have these data.jsonl files in my dataset folder. Also because i train uncond, i also don't have json files with the info on BPM, description and so on, but that's unrelated. I mean, i just have a folder full of mp3's. There's 2 things that might've happened 1) Maybe you've mistyped it and it actually is located in egs/test_1/train/data.jsonl ? 2) Or you've placed your data.jsonl file in the wrong place, it should't be in dataset/test_1/train/data.jsonl, move it into egs/test_1/train/data.jsonl, do the same for valid and others too.

If that's the second case, then it doesn't know where to look for the audio files. Try everything that i've suggested, and it'll probably run without any issues. But if it's not, i'll try to assist you further with that problem.

tomhaydn commented 6 months ago

My issue was that I wrote a script separately to generate the manifest file (and subsequent train test split), without realising that there was already a built-in, duh. The problem was that some sample rates were incorrect.

!python -m audiocraft.data.audio_dataset audiocraft/dataset/test_1 egs/test_1_new/data.jsonl

Maybe you've mistyped it and it actually is located in egs/test_1/train/data.jsonl ?

This was correct, my mistake.

DEBIHOOD commented 6 months ago

My issue was that I wrote a script separately to generate the manifest file (and subsequent train test split), without realising that there was already a built-in, duh. The problem was that some sample rates were incorrect.

!This, i've done exactly the same thing, my DIY script have worked, but i realized that it was not needed after i already have started the training process, and i started to re-read the docs from the ground up again, just in case that i've missed something. That's how i discovered about that there was built-in solution.