This suggests that the maximum of the minimum learning rate and exponentially decayed rate is calculated. But in the configurations file, both the learning rate and the minimum learning rate are supplied the same values. This will result in no updates to the learning rate with more training steps.
OR Is this specifically for the case with no updates in the learning rate?
Thanks.
This suggests that the maximum of the minimum learning rate and exponentially decayed rate is calculated. But in the configurations file, both the learning rate and the minimum learning rate are supplied the same values. This will result in no updates to the learning rate with more training steps. OR Is this specifically for the case with no updates in the learning rate? Thanks.