SysCV / vis4d

A modular library for visual 4D scene understanding
https://docs.vis.xyz/4d/index.html
Apache License 2.0
18 stars 2 forks source link

Optimizer and Checkpoint Refactor #107

Closed RoyYang0714 closed 1 year ago

RoyYang0714 commented 1 year ago

This PR aims to refactor the optimizer and checkpoint functionality. Also formally support CKPT and RESUME flags in CLI.

Features

Optimizer

Create two new param_groups which set the lr to 10.0 base learning for rpn_cls & rpn_box while setting 20.0 base learning rate to fc_cls & fc_reg.

config.optimizers = [ get_optimizer_cfg( ... param_groups_cfg=[ { "custom_keys": [ "faster_rcnn_head.rpn_head.rpn_cls.weight", "faster_rcnn_head.rpn_head.rpn_box.weight", ], "lr_mult": 10.0, }, { "custom_keys": [ "faster_rcnn_head.roi_head.fc_cls.weight", "faster_rcnn_head.roi_head.fc_reg.weight", ], "lr_mult": 20.0, }, ], ) ]


- [x] Separate warmup step from Optimizer step to better work with PL and prevent lr_scheduler stepping error.

### Checkpoint
- [x] Support loading `state_dict` from `checkpoint["model"]`.
- [x] Refactor `CheckpointCallback` and save the following state for `resume`:
  - epoch: current epoch.
  - global_step: current global step.
  - optimizers: list of optimizer states.
  - lr_schedulers: list of learning rate schedulers states.

### CLI
- [x] Support `CKPT` and `RESUME` flags for PL and engine CLI.
  - `CKPT` (str): Path to the checkpoint file.
  - `RESUME` (bool):
    - If resume, then it will load the checkpoint from `CKPT` if specified or use the `config.output_dir/checkpoints/last.ckpt` to restore needed information for resume training.

### Engine
- [x] Support Tensorboard in the trainer. 
  - Log learning rate.
  - Log loss metric.
  - Log eval metric. 

### PL
- [x] Saving hyper parameters to `hparams.yaml` in the training module.
- [x] Remove `TorchOptimizer` for the training module to use the resume supported by PL and restore optimizers and lr schedulers state correctly.
- [x] Refactor `CKPT` and `RESUME` flag. Use `load_model_checkpoint` instead of `trainer.fit(ckpt_path=...)` / `trainer.test(ckpt_path=...)` if not resume.

### Vis
- [x] Add 3D Box visualization & `imsave`.

## Bug Fixes
- [x] Fix depth evaluation metrics and KITTIDepthEvaluator.
- [x] Fix extra step / epoch before stopping in engine.
- [x] Fix `setup_optimizers` to make it possible to set different parameters for multiple optimizers.
- [x] Fix `faster_rcnn_heads` to `faster_rcnn_head` in `Faster R-CNN` related models.
suniique commented 1 year ago

LGTM!