Closed cliangyu closed 2 years ago
Hello!
You could wrap the learning rate in an absolute value, but that would not solve the underlying issue.
I would try lowering the learning rates for both the images and the learning rate first.
Although the loss being that large implies that the synthetic trajectory ended extremely far from the target. Losses should typically lie in [0,1].
You could also try lowering max_start_epoch
. It's likely that the distance between the starting and target points is simply too small as a result of the expert being mostly converged at that point.
Please let me know if there are still issues after you try this!
@c-liangyu in the Dataset Distillation paper (https://arxiv.org/abs/1811.10959) they used softplus
on the learnable learning rate to prevent it from becoming negative, maybe that helps you
@GeorgeCazenavette A lower learning rate works for me. Thank you. @cile98 Thank you for the tips. I guess I will give a shot too : )
Hi! Thank you for your great work.
When I was distilling with my own dataset, there was very large loss (iter = 0490) and negative learning rate.
Could you help me figure out what is happening here? What hyperparameters should be adjusted in such case? Can we implement anything in code to prevent negative LR?
Thank you!