artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
9.74k stars 800 forks source link

a critical loss drop happen after each epoch ending #290

Open Coco58323 opened 3 months ago

Coco58323 commented 3 months ago

image I am curious why the train loss drops after each epoch and tends to converge within one epoch. @artidoro The problem is that the train loss tends to always drop and never converge. I am running 4-bit qlora fine-tuning on alpaca and about 3000 for one epoch. Though authors have explained that pre-training/evaluation loss is not important while the downstream task performance means more, it is common sense to get the pre-trained well converged. Does anyone have this problem?