This PR aims to refactor the optimizer and checkpoint functionality. Also formally support CKPT and RESUME flags in CLI.
Features
Optimizer
[x] Support param groups configuration. Give users the ability to adjust learning_rate and weight_decay of certain parts/modules of the whole model. E.g.
# Create a new param_groups which contains all basemodel parameters and set the lr to 0.1 * base learning.
config.optimizers = [
get_optimizer_cfg(
...
param_groups_cfg=[{"custom_keys": ["basemodel"], "lr_mult": 0.1}],
)
]
Create two new param_groups which set the lr to 10.0 base learning for rpn_cls & rpn_box while setting 20.0 base learning rate to fc_cls & fc_reg.
- [x] Separate warmup step from Optimizer step to better work with PL and prevent lr_scheduler stepping error.
### Checkpoint
- [x] Support loading `state_dict` from `checkpoint["model"]`.
- [x] Refactor `CheckpointCallback` and save the following state for `resume`:
- epoch: current epoch.
- global_step: current global step.
- optimizers: list of optimizer states.
- lr_schedulers: list of learning rate schedulers states.
### CLI
- [x] Support `CKPT` and `RESUME` flags for PL and engine CLI.
- `CKPT` (str): Path to the checkpoint file.
- `RESUME` (bool):
- If resume, then it will load the checkpoint from `CKPT` if specified or use the `config.output_dir/checkpoints/last.ckpt` to restore needed information for resume training.
### Engine
- [x] Support Tensorboard in the trainer.
- Log learning rate.
- Log loss metric.
- Log eval metric.
### PL
- [x] Saving hyper parameters to `hparams.yaml` in the training module.
- [x] Remove `TorchOptimizer` for the training module to use the resume supported by PL and restore optimizers and lr schedulers state correctly.
- [x] Refactor `CKPT` and `RESUME` flag. Use `load_model_checkpoint` instead of `trainer.fit(ckpt_path=...)` / `trainer.test(ckpt_path=...)` if not resume.
### Vis
- [x] Add 3D Box visualization & `imsave`.
## Bug Fixes
- [x] Fix depth evaluation metrics and KITTIDepthEvaluator.
- [x] Fix extra step / epoch before stopping in engine.
- [x] Fix `setup_optimizers` to make it possible to set different parameters for multiple optimizers.
- [x] Fix `faster_rcnn_heads` to `faster_rcnn_head` in `Faster R-CNN` related models.
This PR aims to refactor the optimizer and checkpoint functionality. Also formally support
CKPT
andRESUME
flags in CLI.Features
Optimizer
Create two new param_groups which set the lr to 10.0 base learning for rpn_cls & rpn_box while setting 20.0 base learning rate to fc_cls & fc_reg.
config.optimizers = [ get_optimizer_cfg( ... param_groups_cfg=[ { "custom_keys": [ "faster_rcnn_head.rpn_head.rpn_cls.weight", "faster_rcnn_head.rpn_head.rpn_box.weight", ], "lr_mult": 10.0, }, { "custom_keys": [ "faster_rcnn_head.roi_head.fc_cls.weight", "faster_rcnn_head.roi_head.fc_reg.weight", ], "lr_mult": 20.0, }, ], ) ]