Overflow risk when calculate softmax?

realgump commented 1 year ago

When I increased the batch size from 8 to 16, I observed a decrease in test accuracy by 20%.

Upon examination, I discovered that the softmax of the attention layer, calculated at https://github.com/Sense-X/UniFormer/blob/849cd0cd3b163f84102b1a799019a689d8d3fb8a/video_classification/slowfast/models/uniformer.py#L89, had an anomalous maximum value exceeding 1 for the last 4 videos (4/16).

As noted in issue73, inference with a large batch size can cause the softmax to overflow.

So is that the reason for the varying accuracy with different batch sizes? And is there any suggestions on how to avoid this issue apart from decreasing the batch size?

Andy1621 commented 1 year ago

Yes, it is the overflow that causes the accuracy drop. For me, the best solution is using a small batch when testing a large model/resolution/frame. Or you can try to close AMP? I'm not sure whether FP32 will help.

realgump commented 1 year ago

Thanks a lot.

Sense-X / UniFormer

Overflow risk when calculate softmax? #108