"Environment variable 'DIR_LOGS' not found"

crlandsc / tiny-audio-diffusion

A repository for generating and training short audio samples with unconditional waveform diffusion on accessible consumer hardware (<2GB VRAM GPU)

https://towardsdatascience.com/tiny-audio-diffusion-ddc19e90af9b

MIT License

144 stars 14 forks source link

"Environment variable 'DIR_LOGS' not found" #1

Closed wwerkk closed 1 year ago

wwerkk commented 1 year ago

When running the training script:

python train.py exp=drum_diffusion trainer.gpus=1 exp=drum_diffusion datamodule.dataset.path=data/doof

I get the following error:

    raise KeyError(f"Environment variable '{key}' not found")
omegaconf.errors.InterpolationResolutionError: KeyError raised while resolving interpolation: "Environment variable 'DIR_LOGS' not found"
    full_key: hydra.run.dir
    object_type=dict

From what I understand the problem might be about setting the the logs directory, but not sure how to go about it, since I do not have much experience with Hydra.

wwerkk commented 1 year ago

My bad, if I went through the README properly I would've read this part :) https://github.com/crlandsc/tiny-audio-diffusion#3-define-environment-variables

Which brings me to the question if there is a way to train/finetune a model without using wandb?

I'm still having some issues with the checkpoint not being saved. Wandb claims it has been saved and gives a path, though directories included in that path do not actually exist.

crlandsc commented 1 year ago

Hi @wwerkk - Thanks for bringing this to my attention. I added functionality to be able to train without using wandb. You will have to pull the repo to update the module file to add this functionality. Also, you can use the new drum_diffusion_no_wandb.yaml to train without wandb (it is just drum_diffusion.yaml with the loggers and audio_samples_logger deleted).

On a related note, wandb doesn't actually save the checkpoints, they are saved via PyTorch Lightning under the logs/ckpts folder on your local computer. Wandb is implemented more to keep track of metrics and generate outputs to listen to how your model is performing.

Let me know if this fixes the problem!

wwerkk commented 1 year ago

My problem turned out to be related to the length of training - checkpoint file indeed gets saved in a proper path after validation :)

Just tested the drum_diffusion_no_wandb.yml config and it seems that it still requires a valid wandb username and API key and the wandb run gets created as normal. Do I understand correctly that the difference between the configs, is that with no_wandb the training history does not get synchronized with the cloud?

crlandsc commented 1 year ago

Glad you figured out the issue! You can change how often the checkpoints are logged with line 16 in the exp/drum_diffusion.yaml or exp/drum_diffusion_no_wandb.yaml files.

val_log_every_n_steps: 1000 # Logging interval (Validation and audio generation every n steps)

Yes, that is correct about the configs. The new exp/drum_diffusion_no_wandb.yaml config basically just takes out anything related to wandb logging and keeps everything local.

I am puzzled why it is still requiring wandb credentials. Did you pull the repo to update the main/diffusion_module.py file before you trained again? I had to make a small tweak in this file to fix it for the exp/drum_diffusion_no_wandb.yaml config.

wwerkk commented 1 year ago

Seems like another newbie mistake, command I was running repeatedly had the exp argument doubled, second of them still pointing to the drum_diffusion.yaml config so the first one was overwritten.

You can change how often the checkpoints are logged with line 16 in the exp/drum_diffusion.yaml or exp/drum_diffusion_no_wandb.yaml files. That's what I was thinking, that perhaps saving occurs on validation. When fine-tuning on small amounts of data it would take a while for the validation to take place makes sense!

Speaking of dataset sizes - approximately how large were the datasets the kicks/snares models were trained on? Just wondering how much data would be the lower bound for training/tuning.

Thank you so much for the responses, I'm very much looking forward to some finely-tuned generation soon :)

crlandsc commented 1 year ago

I actually realized that I had the exp argument doubled in the instructions in the readme when you first opened this issue, and I fixed it then - so that one is on me!

The kicks/snares/hi-hats datasets were pretty small, as I was just working with some open-source samples that I had gathered (150-200 samples). So there is definitely a lot of room for fine-tuning! The Percussion model, however, was trained on over 1,000 samples, which is a more appropriate data size. These were just the first models for proof-of-concept and hopefully I can train some better, more diverse models in the future when I collect more data.

Please share any models that you train, I'm incredibly interested to hear what you come up with!

crlandsc commented 1 year ago

If everything is up and running now, I will close this issue. Have fun training!