Couldn't get the same accuracy with eight commonsense reasoning datasets.

ello0211 commented 10 months ago

Hi,thanks for your great work! When I try to reproduce the results with commonssense reasoning datasets, it turns out to be not good as the table. The set I use is the same as the math resoning tasks showen in the readme.could you tell me if I use the right set or could you show me the right way to reproduce the same accuracy as the table. Thank you so much!

HZQ950419 commented 10 months ago

Hi,

The set is a little bit different. I listed the commands below. For LoRA: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

For Series Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'

For Parallel Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

ello0211 commented 10 months ago

ok!I will try these later, thanks a lot

nbasyl commented 7 months ago

@ello0211 Hi, did you manage to get the same result as the table reported? Thx!

ello0211 commented 7 months ago

Sorry, I didn't conduct the experiment exactly according to the parameters you provided. However, I used LoRa with q and v, taking r=4, and obtained slightly inferior results. By the way, it seems that configuring LoRa as you suggested would result in a large number of parameters, right?

HZQ950419 commented 7 months ago

Sorry, I didn't conduct the experiment exactly according to the parameters you provided. However, I used LoRa with q and v, taking r=4, and obtained slightly inferior results. By the way, it seems that configuring LoRa as you suggested would result in a large number of parameters, right?

Hi, with r=32, the number of LoRA parameters should be 8 times of r=4.

ls559 commented 6 months ago

Hi,

The set is a little bit different. I listed the commands below. For LoRA: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

For Series Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'

For Parallel Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

@HZQ950419 I finetuned commonsense_170k.json based on LORA refer to your script, only change eval_step and save_step:

CUDA_VISIBLE_DEVICES=0 python finetune.py \
        --base_model 'yahma/llama-7b-hf'  \
        --data_path 'commonsense_170k.json'   \
        --output_dir $output_path   \
        --batch_size 16  \
        --micro_batch_size 4   \
        --num_epochs 3   \
        --learning_rate 3e-4   \
        --cutoff_len 256   \
        --val_set_size 120 \
        --eval_step 80 \
        --save_step 80  \
        --adapter_name lora \
        --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \
        --lora_r 32 \
        --lora_alpha 64

And evaluated by this script:

CUDA_VISIBLE_DEVICES=0 python commonsense_evaluate.py \
    --model LLaMA-7B \
    --adapter LoRA \
    --dataset $dataset \
    --batch_size 4 \
    --base_model 'yahma/llama-7b-hf' \
    --lora_weights $weights_path

But still couldn't reproduce the same accuracy as the table. For boolq, only 0.6715 accuracy, and 0.3884 for piqa. Can you help me check the problem. Meanwhile, if I want to reproduce the results of llama13 on commonsense_170k.json, how to set the parameters. Thank you so much !

HZQ950419 commented 6 months ago

Hi, The set is a little bit different. I listed the commands below. For LoRA: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64 For Series Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]' For Parallel Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

@HZQ950419 I finetuned commonsense_170k.json based on LORA refer to your script, only change eval_step and save_step:
CUDA_VISIBLE_DEVICES=0 python finetune.py \
        --base_model 'yahma/llama-7b-hf'  \
        --data_path 'commonsense_170k.json'   \
        --output_dir $output_path   \
        --batch_size 16  \
        --micro_batch_size 4   \
        --num_epochs 3   \
        --learning_rate 3e-4   \
        --cutoff_len 256   \
        --val_set_size 120 \
        --eval_step 80 \
        --save_step 80  \
        --adapter_name lora \
        --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \
        --lora_r 32 \
        --lora_alpha 64 
And evaluated by this script:
CUDA_VISIBLE_DEVICES=0 python commonsense_evaluate.py \
    --model LLaMA-7B \
    --adapter LoRA \
    --dataset $dataset \
    --batch_size 4 \
    --base_model 'yahma/llama-7b-hf' \
    --lora_weights $weights_path 
But still couldn't reproduce the same accuracy as the table. For boolq, only 0.6715 accuracy, and 0.3884 for piqa. Can you help me check the problem. Meanwhile, if I want to reproduce the results of llama13 on commonsense_170k.json, how to set the parameters. Thank you so much !

Hi, the command is the same as the one we use. Are you using multi-gpu for fine-tuning? Maybe you can try to use single GPU for fine-tuning, as there are some other researchers can't reproduce the results with multi-gpu training.

AGI-Edgerunners / LLM-Adapters

Couldn't get the same accuracy with eight commonsense reasoning datasets. #38