a critical loss drop happen after each epoch ending

I am curious why the train loss drops after each epoch and tends to converge within one epoch. @artidoro The problem is that the train loss tends to always drop and never converge. I am running 4-bit qlora fine-tuning on alpaca and about 3000 for one epoch. Though authors have explained that pre-training/evaluation loss is not important while the downstream task performance means more, it is common sense to get the pre-trained well converged. Does anyone have this problem?

artidoro / qlora

a critical loss drop happen after each epoch ending #290