Open Coco58323 opened 3 months ago
I am curious why the train loss drops after each epoch and tends to converge within one epoch. @artidoro
The problem is that the train loss tends to always drop and never converge.
I am running 4-bit qlora fine-tuning on alpaca and about 3000 for one epoch.
Though authors have explained that pre-training/evaluation loss is not important while the downstream task performance means more, it is common sense to get the pre-trained well converged. Does anyone have this problem?