Open Zhenyu001225 opened 2 months ago
And for PIQA the result is 74.6 compared with data in table 80.7. For Siqa the result is 60.8 compared with data in table 77.4 Should I finetune again? Or adjusting any of the hypermeters
Hi May I ask whether you solve this issue now?
btw, I find that a larger batch size would lead to some bad output while bsz=1 not.
@wutaiqiang Yes, I also find this problem and bsz=1 can solve the most case, it can still output BAD result for some case.
In my case, the results are even better than reported. You should use one GPU in finetuning.
69.44 | 80.79 | 79.32 | 84.2 | 81.61 | 80.34 | 64.93 | 76.8
boolq | piqa | social_i_qa | hellaswag | winogrande | ARC-Easy | ARC-Challenge | openbookqa
When I'm doing the evaluation, should I use _--load8bit? I'm trying to reproduce the results of LLaMa-7B-LoRA
Finetune:
CUDA_VISIBLE_DEVICES=8 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path './ft-training_set/commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64
Evaluate: CUDA_VISIBLE_DEVICES=3 python commonsense_evaluate.py \ --model LLaMA-7B \ --adapter LoRA \ --dataset boolq \ --batch_size 1 \ --base_model 'yahma/llama-7b-hf' \ --lora_weights './trained_models/llama-7b-lora-commonsense/'
But the result is only 57.5 compared with the table 68.9.. Could you provide me with some insights here?