PanelOptConfig sets learning rates for the optimisers based on the grads on their parameters during a frozen warm-up period, however CostCoefWarmup can also be learning a suitable coefficient to balance detector cost and performance sub-loss during the same warm-up period.
Problem
When CostCoefWarmup sets the new balancing coefficient, the losses and gradients on the parameters can be very different to those recorded by PanelOptConfig, meaning that the learning rates it sets are no longer suitable.
Proposed solution
PanelOptConfig checks for the presence of CostCoefWarmup in the wrapper callbacks, and delays its monitoring and warm-up period until after PanelOptConfig has finished. This ensures that the gradients it tracks are representative of those that will actually be seen after training finishes.
It would probably be more generalising to create a new list of "warm-up" callbacks in the volume wrapper fit_params, and have them wait their turn to implement their warm-up periods.
Current state
PanelOptConfig
sets learning rates for the optimisers based on the grads on their parameters during a frozen warm-up period, howeverCostCoefWarmup
can also be learning a suitable coefficient to balance detector cost and performance sub-loss during the same warm-up period.Problem
When
CostCoefWarmup
sets the new balancing coefficient, the losses and gradients on the parameters can be very different to those recorded byPanelOptConfig
, meaning that the learning rates it sets are no longer suitable.Proposed solution
PanelOptConfig
checks for the presence ofCostCoefWarmup
in the wrapper callbacks, and delays its monitoring and warm-up period until afterPanelOptConfig
has finished. This ensures that the gradients it tracks are representative of those that will actually be seen after training finishes. It would probably be more generalising to create a new list of "warm-up" callbacks in the volume wrapper fit_params, and have them wait their turn to implement their warm-up periods.