Closed ShenXianwen closed 5 years ago
Hi, thanks for your attention. @ShenXianwen
did you set a small batch size? what was the number, exactly? when was the nan appearing, during training after several epochs or right in the first epoch?
sorry I could not try by myself since I have no access to the servers until next week.
hello,thanks for your reply.I set the batch size to 64.The nan appearing during training after 2 epochs.I didn't use your prepared data(MSMT17.mat and Market.mat). I followed your steps to run the construct_dataset_Market.m and construct_dataset_MSMT17.m in MATLAB. But I used the prepared_weight.pth.
ok. let me try it next week when I have the access to servers.
Hi @ShenXianwen, it turns out that the nan comes out because the default learning rate is too large for a small batch size like 64. A small batch size indicates a stronger and sharper gradient (large batch size would average over more samples, thus smooth gradient), so we need to turn down the lr. I did not try much, but dividing the lr by 10 would enable you to get rid of this problem.
However we should note that the performance would probably drop, since the distribution estimation is less precise due to small batch size.
Hello,thanks for your share.
There will be an error that the value of LOSS is NAN when I run the code.I only changed the value of batchsize, the rest of the parameters all used the default values.
I don't know how to solve it. Could you give me some suggestions? Thanks.