Open ello0211 opened 10 months ago
Hi,
The set is a little bit different. I listed the commands below.
For LoRA:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64
For Series Adapter:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'
For Parallel Adapter:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'
ok!I will try these later, thanks a lot
@ello0211 Hi, did you manage to get the same result as the table reported? Thx!
Sorry, I didn't conduct the experiment exactly according to the parameters you provided. However, I used LoRa with q and v, taking r=4, and obtained slightly inferior results. By the way, it seems that configuring LoRa as you suggested would result in a large number of parameters, right?
Sorry, I didn't conduct the experiment exactly according to the parameters you provided. However, I used LoRa with q and v, taking r=4, and obtained slightly inferior results. By the way, it seems that configuring LoRa as you suggested would result in a large number of parameters, right?
Hi, with r=32, the number of LoRA parameters should be 8 times of r=4.
Hi,
The set is a little bit different. I listed the commands below. For LoRA:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64
For Series Adapter:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'
For Parallel Adapter:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'
@HZQ950419 I finetuned commonsense_170k.json based on LORA refer to your script, only change eval_step and save_step:
CUDA_VISIBLE_DEVICES=0 python finetune.py \
--base_model 'yahma/llama-7b-hf' \
--data_path 'commonsense_170k.json' \
--output_dir $output_path \
--batch_size 16 \
--micro_batch_size 4 \
--num_epochs 3 \
--learning_rate 3e-4 \
--cutoff_len 256 \
--val_set_size 120 \
--eval_step 80 \
--save_step 80 \
--adapter_name lora \
--target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \
--lora_r 32 \
--lora_alpha 64
And evaluated by this script:
CUDA_VISIBLE_DEVICES=0 python commonsense_evaluate.py \
--model LLaMA-7B \
--adapter LoRA \
--dataset $dataset \
--batch_size 4 \
--base_model 'yahma/llama-7b-hf' \
--lora_weights $weights_path
But still couldn't reproduce the same accuracy as the table. For boolq, only 0.6715 accuracy, and 0.3884 for piqa. Can you help me check the problem. Meanwhile, if I want to reproduce the results of llama13 on commonsense_170k.json, how to set the parameters. Thank you so much !
Hi, The set is a little bit different. I listed the commands below. For LoRA:
CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64
For Series Adapter:CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'
For Parallel Adapter:CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'
@HZQ950419 I finetuned commonsense_170k.json based on LORA refer to your script, only change eval_step and save_step:
CUDA_VISIBLE_DEVICES=0 python finetune.py \ --base_model 'yahma/llama-7b-hf' \ --data_path 'commonsense_170k.json' \ --output_dir $output_path \ --batch_size 16 \ --micro_batch_size 4 \ --num_epochs 3 \ --learning_rate 3e-4 \ --cutoff_len 256 \ --val_set_size 120 \ --eval_step 80 \ --save_step 80 \ --adapter_name lora \ --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \ --lora_r 32 \ --lora_alpha 64
And evaluated by this script:
CUDA_VISIBLE_DEVICES=0 python commonsense_evaluate.py \ --model LLaMA-7B \ --adapter LoRA \ --dataset $dataset \ --batch_size 4 \ --base_model 'yahma/llama-7b-hf' \ --lora_weights $weights_path
But still couldn't reproduce the same accuracy as the table. For boolq, only 0.6715 accuracy, and 0.3884 for piqa. Can you help me check the problem. Meanwhile, if I want to reproduce the results of llama13 on commonsense_170k.json, how to set the parameters. Thank you so much !
Hi, the command is the same as the one we use. Are you using multi-gpu for fine-tuning? Maybe you can try to use single GPU for fine-tuning, as there are some other researchers can't reproduce the results with multi-gpu training.
Hi,thanks for your great work! When I try to reproduce the results with commonssense reasoning datasets, it turns out to be not good as the table. The set I use is the same as the math resoning tasks showen in the readme.could you tell me if I use the right set or could you show me the right way to reproduce the same accuracy as the table. Thank you so much!