Closed BZboys closed 2 years ago
Thanks for your interest! During the training process, we sometimes meet the same problems. My solution is to fall back to the latest normal training node, and then decrease the learning rate. (e.g. 10^-4.5 -->10^-5)
Tips: Note that it would be better to set lamda_low_frequency=0 in the pre-trained stage. After the model has converged, then add lamda_low_frequency to fine-tune the network.
We have updated a training demo for your question in Readme. Hope this helps~
Thanks for your code! But I can not repreduce the performance with the same training settings. The model are prone to degradation during training. Looking forward to your reply!