NVIDIA / modulus

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
https://developer.nvidia.com/modulus
Apache License 2.0
941 stars 220 forks source link

LBFGS optimizer doesn't work for PINN training 🐛[BUG]: #492

Open hasethinvd opened 4 months ago

hasethinvd commented 4 months ago

Version

24.01

On which installation method(s) does this occur?

Docker, Pip, Source

Describe the issue

After specifying the optimizer to be bfgs in config file, it overrides the max_steps to 0

Minimum reproducible example

#config
defaults :
  - modulus_default
  - arch:
      - fourier
      - modified_fourier
      - fully_connected
      - multiscale_fourier
  - scheduler: tf_exponential_lr
  - optimizer: bfgs
  - loss: sum

training:
  rec_results_freq: 1000
  max_steps : 150000

Relevant log output

[23:53:04] - lbfgs optimizer selected. Setting max_steps to 0
[23:53:05] - [step:     100000] lbfgs optimization in running
Error executing job with overrides: []
Traceback (most recent call last):
  File "/mount/data/test/eikonal/eikonal.py", line 313, in run
    slv.solve()
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/solver/solver.py", line 173, in solve
    self._train_loop(sigterm_handler)
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 543, in _train_loop
    loss, losses = self._cuda_graph_training_step(step)
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 730, in _cuda_graph_training_step
    self.apply_gradients()
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 185, in bfgs_apply_gradients
    self.optimizer.step(self.bfgs_closure_func)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 379, in wrapper
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/lbfgs.py", line 298, in step
    max_iter = group['max_iter']
KeyError: 'max_iter'

Environment details

No response

avidcoder123 commented 3 weeks ago

This issue is still active and needs fixing.