Weird Loss Curve - Githubissues

I trained the llama3 on my own conversation dataset with the command : ./scripts/run_finetune.sh \ --model_name_or_path meta-llama/Meta-Llama-3-8B \ --dataset_path data/alpaca_selected/train \ --conversation_template llama3 \ --output_model_path output_models/finetuned_llama3_8b_selected

The initial learning rate is 2e-5 and batchsize_per_device is 4 And I found there are sharp drops at the beginning of every epoch. But during the epoch, there's no obvious loss drop.

Before this I trained llama2 ./scripts/run_finetune.sh \ --model_name_or_path meta-llama/Llama-2-7b-hf \ --dataset_path data/alpaca_raw/train \ --conversation_template llama2 \ --output_model_path output_models/finetuned_llama2_7b_raw

The initial learning rate is 8e-6 and batchsize_per_device is 4. The loss looks like :

I am not sure if the gradient accumulation leads to this. I modified the "gradient_accumulation_steps" in configs/ds_config_zero3.json to 1 . But there's no changes.

Could you help me with this issue? Thank you for your time and attention.

OptimalScale / LMFlow

Weird Loss Curve #831