facebookresearch / dadaptation

D-Adaptation for SGD, Adam and AdaGrad
MIT License
501 stars 19 forks source link

Exception in Lion optimizer #30

Closed tjennings closed 1 year ago

tjennings commented 1 year ago

Hey, thanks for open sourcing this. I'm excited to try it out! Context here, I have a stable diffusion trainer built with the diffuser's accelerate library. When attempting to use the dadapt_lion I see the following exception:

Traceback (most recent call last): 
  File "/home/coreco/Documents/code/dreamlike-trainer/train.py", line 63, in <module>
    trainer.train()
  File "/home/coreco/Documents/code/dreamlike-trainer/DreamlikeTrainer.py", line 246, in train
    self.step(step, batch)
  File "/home/coreco/Documents/code/dreamlike-trainer/DreamlikeTrainer.py", line 280, in step
    self.te_optimizer.step()
  File "/home/coreco/Documents/code/dreamlike-trainer/venv/lib/python3.10/site-packages/accelerate/optimizer.py", line 140, in step
    self.optimizer.step(closure)
  File "/home/coreco/Documents/code/dreamlike-trainer/venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
    return wrapped(*args, **kwargs)
  File "/home/coreco/Documents/code/dreamlike-trainer/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/home/coreco/Documents/code/dreamlike-trainer/venv/lib/python3.10/site-packages/dadaptation/dadapt_lion.py", line 141, in step
    logging.info(f"lr: {lr} dlr: {dlr} d_hat: {d_hat}, d: {d}. sk_l1={global_sk_l1:1.1e} numerator_weighted={global_numerator_weighted:1.1e}")
UnboundLocalError: local variable 'd_hat' referenced before assignment
Traceback (most recent call last):
  File "/home/coreco/Documents/code/dreamlike-trainer/venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/coreco/Documents/code/dreamlike-trainer/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/coreco/Documents/code/dreamlike-trainer/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 923, in launch_command
    simple_launcher(args)
  File "/home/coreco/Documents/code/dreamlike-trainer/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 579, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/coreco/Documents/code/dreamlike-trainer/venv/bin/python3', 'train.py', '--config_path', './local/dadapt_lion_256px.json5']' returned non-zero exit status 1.