Hi, I am trying to finetune llama on commonsense_170k. However, I find the when the loss value is around 0.6, it almost does not decrease. Is it normal?
For different dataset, the loss can be different. The loss seems normal, you can evaluate the performance of the trained model to check if the training works well.
Hi, I am trying to finetune llama on commonsense_170k. However, I find the when the loss value is around 0.6, it almost does not decrease. Is it normal?
WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=1,2,3,4 torchrun --nproc_per_node=4 finetune.py --base_model 'yahma/llama-7b-hf' --data_path './LLM-Adapters/ft-training_set/commonsense_170k.json' --output_dir './trained_models/llama-sparselora-commonsense_new' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 0 --adapter_name lora --lora_r=32 --lora_target_modules=["k_proj","q_proj","v_proj","down_proj","up_proj"] --lora_alpha=64