Open devavratTomar opened 9 months ago
Thank you for your work. While trying your code for the Resnext backbone on cifar100, I get nan values for the training loss. As mentioned in the published paper, I use the initial learning rate of 0.1 for SGD with cosine scheduling.
Yes, same here. Could you please help with this?
Thanks, Sara
Thank you for your work. While trying your code for the Resnext backbone on cifar100, I get nan values for the training loss. As mentioned in the published paper, I use the initial learning rate of 0.1 for SGD with cosine scheduling.