crlandsc / tiny-audio-diffusion

A repository for generating and training short audio samples with unconditional waveform diffusion on accessible consumer hardware (<2GB VRAM GPU)
https://towardsdatascience.com/tiny-audio-diffusion-ddc19e90af9b
MIT License
143 stars 14 forks source link

Training Freezes Before Starting #2

Closed yukiarimo closed 2 days ago

yukiarimo commented 1 month ago
(tiny-audio-diffusion) yuki@yuki tiny-audio-diffusion % python train.py exp=drum_diffusion trainer.gpus=1 datamodule.dataset.path=/Users/yuki/Downloads/tiny-audio-diffusion/samples
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[2024-05-11 02:16:26,217][main.utils][INFO] - Disabling python warnings! <config.ignore_warnings=True>
Global seed set to 12345
[2024-05-11 02:16:26,220][__main__][INFO] - Instantiating datamodule <main.diffusion_module.Datamodule>.
[2024-05-11 02:16:27,005][__main__][INFO] - Instantiating model <main.diffusion_module.Model>.
[2024-05-11 02:16:27,183][__main__][INFO] - Instantiating callback <pytorch_lightning.callbacks.RichProgressBar>.
[2024-05-11 02:16:27,183][__main__][INFO] - Instantiating callback <pytorch_lightning.callbacks.ModelCheckpoint>.
[2024-05-11 02:16:27,185][__main__][INFO] - Instantiating callback <pytorch_lightning.callbacks.RichModelSummary>.
[2024-05-11 02:16:27,186][__main__][INFO] - Instantiating callback <main.diffusion_module.SampleLogger>.
[2024-05-11 02:16:27,187][__main__][INFO] - Instantiating logger <pytorch_lightning.loggers.wandb.WandbLogger>.
wandb: Currently logged in as: yukiarimo. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.17.0
wandb: Run data is saved locally in /Users/yuki/Downloads/tiny-audio-diffusionlogs/wandb/run-20240511_021628-7k1pjexi
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run unconditional_diffusion
wandb: ⭐️ View project at https://wandb.ai/yukiarimo/wandbprojectname
wandb: πŸš€ View run at https://wandb.ai/yukiarimo/wandbprojectname/runs/7k1pjexi
[2024-05-11 02:16:33,399][__main__][INFO] - Instantiating trainer <pytorch_lightning.Trainer>.
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[2024-05-11 02:16:33,438][__main__][INFO] - Logging hyperparameters!
[2024-05-11 02:16:33,456][__main__][INFO] - Starting training.
┏━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃   ┃ Name                ┃ Type           ┃ Params ┃
┑━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━┩
β”‚ 0 β”‚ model               β”‚ DiffusionModel β”‚ 31.6 M β”‚
β”‚ 1 β”‚ model.net           β”‚ Module         β”‚ 31.6 M β”‚
β”‚ 2 β”‚ model.diffusion     β”‚ VDiffusion     β”‚ 31.6 M β”‚
β”‚ 3 β”‚ model.sampler       β”‚ VSampler       β”‚ 31.6 M β”‚
β”‚ 4 β”‚ model_ema           β”‚ EMA            β”‚ 63.1 M β”‚
β”‚ 5 β”‚ model_ema.ema_model β”‚ DiffusionModel β”‚ 31.6 M β”‚
β””β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Trainable params: 31.6 M                                                        
Non-trainable params: 31.6 M                                                    
Total params: 63.1 M                                                            
Total estimated model params size (MB): 126                                     
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
crlandsc commented 1 month ago

Hi @yukiarimo. What you shared are pretty standard logs, so they do not really provide any context into what might be your issue. I have not tested this repo on MPS, rather only NVIDIA GPUs or CPUs, so I would start there (i.e. remove the trainer.gpus=1 argument).