Closed cclamd closed 8 months ago
When the batch size is small, it may lead to an entire batch consisting solely of abnormal samples, thereby affecting the calculation of the paper's loss formula (Formula 9). We have fixed this. We have fixed this bug. Please try again.
hi , @HuiZhang0812 ,
there is no A100(80G) for me ,and i trained the model using six 3090 cards for distributed deep learning ,but find loss is alse nan ! so does it decrease for distributed deep learning ?
best regards!