CostCoefWarmup and PanelOptConfig both act by 'freezing' detector optimisation for a few warm-up epochs in order to set suitable aspects of the loss and optimisers based on the initial state of the detector.
They currently implement freezing by setting the optimiser learning rates to zero.
Potential problems
Setting the LR to zero works fine for SGD without momentum, however if the optimiser has momentum, it can still accumulate momentum and implement changes to the detectors through updates based on this. Additionally, some optimisers, e.g. Adam & RMSProp, track a history of gradients which are used adapt the effective learning rate and momentum coefficients used in future updates, so even setting the momentum rate to zero won't help (nor would setting grads to zero).
Proposed solution
AbsDetectorLoss and PanelOptConfig don't alter the optimiser hyper-parameters, instead they set a flag in the volume wrapper fit_params that tells the wrapper to skip the optimiser step calls.
Current state
CostCoefWarmup
andPanelOptConfig
both act by 'freezing' detector optimisation for a few warm-up epochs in order to set suitable aspects of the loss and optimisers based on the initial state of the detector. They currently implement freezing by setting the optimiser learning rates to zero.Potential problems
Setting the LR to zero works fine for SGD without momentum, however if the optimiser has momentum, it can still accumulate momentum and implement changes to the detectors through updates based on this. Additionally, some optimisers, e.g. Adam & RMSProp, track a history of gradients which are used adapt the effective learning rate and momentum coefficients used in future updates, so even setting the momentum rate to zero won't help (nor would setting grads to zero).
Proposed solution
AbsDetectorLoss
andPanelOptConfig
don't alter the optimiser hyper-parameters, instead they set a flag in the volume wrapper fit_params that tells the wrapper to skip the optimiser step calls.