I'm currently experimenting with a custom X-UNet to train a conditional model with my custom embeddings. I'm also using classifier-free guidance with a probability of 0.1. I've tested my whole pipeline in advance on a toy-UNet.
The train seems to work fine, and the samples generated at checkpoints are indeed getting better, also according to the conditioning signals.
However, I noticed some strange behaviour in the training loss curve. During the first few iterations, the loss improves quickly and reach a plateau, stabilizing at a mean value of approximately 0.02. After some time, however, some weird spikes start to appear in the loss curve, with values around 0.25. They're non-periodic, not epoch-dependent, and they occur randomly after a certain point, remaining from there onwards every approximately 200 steps on average.
I've also tried to restart the training from the last checkpoint: initially, the spikes disappeared, but reappeared again after further ~20k steps (see the attached image).
Did somebody experienced similar behaviour? What could be the possible cause? Is it a problem?
Hi!
I'm currently experimenting with a custom X-UNet to train a conditional model with my custom embeddings. I'm also using classifier-free guidance with a probability of 0.1. I've tested my whole pipeline in advance on a toy-UNet. The train seems to work fine, and the samples generated at checkpoints are indeed getting better, also according to the conditioning signals.
However, I noticed some strange behaviour in the training loss curve. During the first few iterations, the loss improves quickly and reach a plateau, stabilizing at a mean value of approximately 0.02. After some time, however, some weird spikes start to appear in the loss curve, with values around 0.25. They're non-periodic, not epoch-dependent, and they occur randomly after a certain point, remaining from there onwards every approximately 200 steps on average. I've also tried to restart the training from the last checkpoint: initially, the spikes disappeared, but reappeared again after further ~20k steps (see the attached image).
Did somebody experienced similar behaviour? What could be the possible cause? Is it a problem?
Thank you!