aik2mlj / polyffusion

Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls
https://polyffusion.github.io
MIT License
71 stars 8 forks source link

Erorr: "File exists:" after creating log directory #5

Closed drscotthawley closed 2 months ago

drscotthawley commented 6 months ago

Thank you for sharing your code. After completing all the installation, I tried running what I thought was the appropriate command for training, but I get an error in that it seems to try to create the same directory twice, and generates an error on the second time. Is this normal? Do you have any suggestions for fixing it?

Thanks!

$ python polyffusion/main.py --model sdf_chdvnl  --output_dir /runs/shawley/polyffusion
Creating new log folder as /runs/shawley/polyffusion/24-02-14_074902
load train valid set with: {}
Dataloader ready: batch_size=16, num_workers=4, pin_memory=True, train_segments=57543, val_segments=6522 {}
Total parameters: 44686850
model_name: sdf_chdvnl
batch_size: 16
max_epoch: 100
learning_rate: 5.0e-05
max_grad_norm: 10
fp16: true
num_workers: 4
pin_memory: true
in_channels: 2
out_channels: 2
channels: 64
attention_levels:
- 2
- 3
n_res_blocks: 2
channel_multipliers:
- 1
- 2
- 4
- 4
n_heads: 4
tf_layers: 1
d_cond: 1152
linear_start: 0.00085
linear_end: 0.012
n_steps: 1000
latent_scaling_factor: 0.18215
img_h: 128
img_w: 128
cond_type: chord
cond_mode: mix
use_enc: false
chd_n_step: 32
chd_input_dim: 36
chd_z_input_dim: 512
chd_hidden_dim: 512
chd_z_dim: 512

Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/4
Creating new log folder as /runs/shawley/polyffusion/24-02-14_074907
Creating new log folder as /runs/shawley/polyffusion/24-02-14_074907
Traceback (most recent call last):
  File "/home/shawley/diffusion/polyffusion/polyffusion/main.py", line 36, in <module>
    config = LDM_TrainConfig(
  File "/home/shawley/diffusion/polyffusion/polyffusion/train/train_ldm.py", line 31, in __init__
    super().__init__(params, None, output_dir)
  File "/home/shawley/diffusion/polyffusion/polyffusion/train/__init__.py", line 40, in __init__
    os.makedirs(output_dir)
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/runs/shawley/polyffusion/24-02-14_074907'
Creating new log folder as /runs/shawley/polyffusion/24-02-14_074907
Traceback (most recent call last):
  File "/home/shawley/diffusion/polyffusion/polyffusion/main.py", line 36, in <module>
    config = LDM_TrainConfig(
  File "/home/shawley/diffusion/polyffusion/polyffusion/train/train_ldm.py", line 31, in __init__
    super().__init__(params, None, output_dir)
  File "/home/shawley/diffusion/polyffusion/polyffusion/train/__init__.py", line 40, in __init__
    os.makedirs(output_dir)
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/runs/shawley/polyffusion/24-02-14_074907'
load train valid set with: {}
[rank: 2] Child process with PID 57408 terminated with code 1. Forcefully terminating all other processes to avoid zombies 🧟
Killed

(Before running, the directory /runs/shawley/polyffusion/ is completely empty)

PS I get a similar "File exists" error when running the command copied from the README:

$ python polyffusion/main.py --model sdf_chd8bar --output_dir result/sdf_chd8bar
drscotthawley commented 6 months ago

A quick fix is just to add the kwarg , exist_ok=True to the offending line, line 40 in train/__init__.py With that change, the error goes away and the model trains!

aik2mlj commented 2 months ago

Thank you! The issue has been fixed by 09773076f2cc0033cf39b99e1ecdb305f57d578d.