Negative loss when pre-training on CC12M

Optimization-AI / fast_clip

MIT License

21 stars 1 forks source link

Negative loss when pre-training on CC12M #2

Closed ivonajdenkoska closed 4 months ago

ivonajdenkoska commented 4 months ago

Hi,

Thanks a lot for sharing the code. I'm trying to reproduce the pre-training on CC12M using 4 H100 but I'm getting a negative loss after training for a while (see the screenshot). Have you also observed this? Thanks in advance!

xywei00 commented 4 months ago

Yes, this is normal. In short, this "loss" is only a term that facilitates computing the gradient estimator in compositional optimization, and it is not the value of the loss function.