Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
27.91k stars 3.34k forks source link

LightningCLI doesn't save optimizer's configuration if not explicitly given #20103

Open adosar opened 1 month ago

adosar commented 1 month ago

Bug description

From the Docs:

To ease experiment reporting and reproducibility, by default LightningCLI automatically saves the full YAML configuration in the log directory. After multiple fit runs with different hyperparameters, each one will have in its respective log directory a config.yaml file.

I would expect the saved config.yaml to include the options for --optimizer etc. even if they were not passed, just like it populates the config.yaml with --seed_everything even if it wasn't explicitly passed in the CLI.

What version are you seeing the problem on?

v2.2

How to reproduce the bug

# main.py
from lightning.pytorch.cli import LightningCLI

# simple demo classes for your convenience
from lightning.pytorch.demos.boring_classes import DemoModel, BoringDataModule

def cli_main():
    cli = LightningCLI(DemoModel, BoringDataModule)
    # note: don't call fit!!

if __name__ == "__main__":
    cli_main()

Now from the console:

python main.py fit --trainer.max_epochs=1

Error messages and logs

The config.yaml under lightning_logs/version_0 is this:

# lightning.pytorch==2.2.1
seed_everything: 0
trainer:
  accelerator: auto
  strategy: auto
  devices: auto
  num_nodes: 1
  precision: null
  logger: null
  callbacks: null
  fast_dev_run: false
  max_epochs: 1
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: null
  limit_val_batches: null
  limit_test_batches: null
  limit_predict_batches: null
  overfit_batches: 0.0 
  val_check_interval: null
  check_val_every_n_epoch: 1
  num_sanity_val_steps: null
  log_every_n_steps: null
  enable_checkpointing: null
  enable_progress_bar: null
  enable_model_summary: null
  accumulate_grad_batches: 1
  gradient_clip_val: null
  gradient_clip_algorithm: null
  deterministic: null
  benchmark: null
  inference_mode: true
  use_distributed_sampler: true
  profiler: null
  detect_anomaly: false
  barebones: false
  plugins: null
  sync_batchnorm: false
  reload_dataloaders_every_n_epochs: 0
  default_root_dir: null
model:
  out_dim: 10
  learning_rate: 0.02
ckpt_path: null

Environment

Current environment ``` #- PyTorch Lightning Version (e.g., 1.5.0): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): ```

More info

No response

cc @carmocca @mauvilsa

awaelchli commented 1 month ago

Hey @adosar Hmm, Lightning doesn't have a concept of a "default optimizer", so I don't think the CLI / jsonargparse could ever know what that should be in your model. The default is simply whatever you specify in your return statement of the configure_optimizers().

mauvilsa commented 1 month ago

@adosar the current behavior comes from https://github.com/omni-us/jsonargparse/pull/90. Maybe the default should be None instead of SUPPRESS. I am not sure this moment if changing that would have unexpected consequences. But note, with this in the saved config there would be optimizer: null. Is this what you are expecting?

mauvilsa commented 1 month ago

Also note that the automatic configure_optimizers is just for convenience to support the most basic cases. But to have configurable optimizers, the recommended approach is to do as described here.

adosar commented 1 month ago

@mauvilsa Yes, this is what I was expecting: optimizer: null. This is of course just my opinion, but I think including all the configuration settings even those not explicitly given is in full accordance with:

by default LightningCLI automatically saves the full YAML configuration in the log directory.

However, since the configuration of optimizers/schedulers doesn't cover all the cases, it might be more appropriate to just emphasize it in the docs.