Thanks for your great work!
It can be seen from the log that Adam is very stable, but the convergence time is longer. I tried to torch.optim.lbfgs to speed up the convergence, but I don’t understand why nan always appears in training. Can you provide some solutions? Have been troubled for a long time, looking forward to your reply!
Thanks for your great work! It can be seen from the log that Adam is very stable, but the convergence time is longer. I tried to
torch.optim.lbfgs
to speed up the convergence, but I don’t understand whynan
always appears in training. Can you provide some solutions? Have been troubled for a long time, looking forward to your reply!