AGI-Edgerunners / LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"
https://arxiv.org/abs/2304.01933
Apache License 2.0
1.05k stars 99 forks source link

about loss #65

Open haoyuwangwhy opened 5 months ago

haoyuwangwhy commented 5 months ago

Hi, I am trying to finetune llama on commonsense_170k. However, I find the when the loss value is around 0.6, it almost does not decrease. Is it normal?

WORLD_SIZE=2 CUDA_VISIBLE_DEVICES=1,2,3,4 torchrun --nproc_per_node=4 finetune.py --base_model 'yahma/llama-7b-hf' --data_path './LLM-Adapters/ft-training_set/commonsense_170k.json' --output_dir './trained_models/llama-sparselora-commonsense_new' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 0 --adapter_name lora --lora_r=32 --lora_target_modules=["k_proj","q_proj","v_proj","down_proj","up_proj"] --lora_alpha=64

HZQ950419 commented 4 months ago

Hi @haoyuwangwhy ,

For different dataset, the loss can be different. The loss seems normal, you can evaluate the performance of the trained model to check if the training works well.