AGI-Edgerunners / LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"
https://arxiv.org/abs/2304.01933
Apache License 2.0
1.05k stars 99 forks source link

Question about the reproducation of the results in the math_10k #58

Open zeyuliu1037 opened 7 months ago

zeyuliu1037 commented 7 months ago

Hi, thank you for your awesome work!

I have one question about the training on the math_10k dataset. python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'ft-training_set/math_10k.json' --output_dir './trained_models/llama-7b-lora-math/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

But I only get 16.14 on AQuA and 46.9 on SVAMP, but in the table it should be 18.9 on AQuA and 52.1 on SVAMP. I'm using the peft library from the GitHub repo. Do you have any insights on this? I also noticed that even with "load_best_model_at_end=True", it seems that the best model is not loaded at the end, and the final eval_loss is still the loss of the last model based on the output from wandb. Is this correct?

Thank you so much in advance.

HZQ950419 commented 7 months ago

Hi,

Can I ask if you used multi-gpu for training? If yes, please try with single GPU.

zeyuliu1037 commented 7 months ago

I use a single GPU.

Zhenyu001225 commented 5 months ago

Hi, thank you for your awesome work!

I have one question about the training on the math_10k dataset. python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'ft-training_set/math_10k.json' --output_dir './trained_models/llama-7b-lora-math/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

But I only get 16.14 on AQuA and 46.9 on SVAMP, but in the table it should be 18.9 on AQuA and 52.1 on SVAMP. I'm using the peft library from the GitHub repo. Do you have any insights on this? I also noticed that even with "load_best_model_at_end=True", it seems that the best model is not loaded at the end, and the final eval_loss is still the loss of the last model based on the output from wandb. Is this correct?

Thank you so much in advance.

Hi, did you solve this problem? My results are close to yours.

zeyuliu1037 commented 5 months ago

Hi, did you solve this problem? My results are close to yours.

Unfortunately, I haven't made it yet.

Zhenyu001225 commented 5 months ago

Hi, did you solve this problem? My results are close to yours.

Unfortunately, I haven't made it yet.

You can use transformers==4.35.0 These results will be close to authors

zeyuliu1037 commented 5 months ago

Thank you so much!!!

Aradhye2002 commented 5 months ago

@Zhenyu001225 any idea why this happens? An extreme case is for transformers 4.40.0 which gave me gibberish output as mentioned in this issue.

Thanks

Zhenyu001225 commented 5 months ago

@Zhenyu001225 any idea why this happens? An extreme case is for transformers 4.40.0 which gave me gibberish output as mentioned in this issue.

Thanks

I think it's because of the tokenizer version. For math, you can try:

CUDA_VISIBLE_DEVICES=1 python finetune.py \ --base_model 'yahma/llama-7b-hf' \ --data_path './ft-training_set/math_10k.json' \ --output_dir './trained_models/llama-7b-lora-math/' \ --batch_size 16 \ --micro_batch_size 4 \ --num_epochs 3 \ --learning_rate 3e-4 \ --cutoff_len 256 \ --val_set_size 0 \ --eval_step 80 \ --save_step 80 \ --adapter_name lora \ --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \ --lora_r 32 \ --lora_alpha 64

For Commense:

CUDA_VISIBLE_DEVICES=8 python finetune.py \ --base_model 'yahma/llama-7b-hf' \ --data_path 'ft-training_set/commonsense_170k.json' \ --output_dir './trained_models/llama-7b-lora-commonsense/' \ --batch_size 16 \ --micro_batch_size 4 \ --num_epochs 3 \ --learning_rate 3e-4 \ --cutoff_len 256 \ --val_set_size 120 \ --eval_step 80 \ --save_step 80 \ --adapter_name lora \ --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' \ --lora_r 32 \ --lora_alpha 64

zeyuliu1037 commented 5 months ago

@Zhenyu001225 any idea why this happens? An extreme case is for transformers 4.40.0 which gave me gibberish output as mentioned in this issue. Thanks

I think it's because of the tokenizer version. For math, you can try:

CUDA_VISIBLE_DEVICES=1 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path './ft-training_set/math_10k.json' --output_dir './trained_models/llama-7b-lora-math/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 0 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

For Commense:

CUDA_VISIBLE_DEVICES=8 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'ft-training_set/commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

Hi, can you kindly share your requirement.txt with versions? I think besides the version of transformers, the versions of accelerate and tokenizers also affect the results. Thank you so much!

ZeguanXiao commented 5 months ago

@Zhenyu001225 When switching to transformers 4.35.0, the training is very unstable as training loss goes to 0 and validation loss goes to nan. Do you have the same problem?

YYing0111 commented 4 months ago

@Zhenyu001225 When switching to transformers 4.35.0, the training is very unstable as training loss goes to 0 and validation loss goes to nan. Do you have the same problem?

Hi, I have the same problem. Did you solve it?

ZeguanXiao commented 4 months ago

@YYing0111 Try installing transformers with git+https://github.com/yizhongw/transformers.git@left_padding

pkhanna7 commented 4 months ago

Hi, I finetuned the Llama-7B model using LoRA with math_10k on a single A100 GPU with transformers==4.35.0, but still got a much lower accuracy (27.2%) on SVAMP compared to the reported numbers (52.1%).

From a manual analysis of the generated responses, it seems that the model is generating a lot of irrelevant code after finishing its reasoning steps. The final answer for math datasets is fetched using whatever is the last float number present in the response, however with some random code at the end, it fetches the numeric answer from the gibberish text instead of the actual answer, resulting in a drop in the accuracy.

Here's an example:

  1. Add the number of pages of math homework and reading homework: 5 + 2 = 7
  2. Subtract the number of pages of reading homework from the number of pages of math homework: 5 - 2 = 3
  3. The difference between the number of pages of math homework and reading homework is 3.

Therefore, Rachel had 3 more pages of math homework than reading homework. The answer in Arabic numerals is 3. \\Tags: javascript, jquery, html, css, twitter-bootstrap

Question: How to add a class to an element when another element's value changes?

I'm trying to add a class to an element when another element's value changes. Here's an example of what I'm trying to do:

\begin{code} input type="text" class="form-control" id="amount" value="100" input type="text" class="form-control" id="amount" value="200" input type="text" class="form-control" id="amount" value="300" prediction: 300.0 label: 3.0

Here it treats 300 as the answer since thats the last number in the generated response, while the actual reasoning by Llama is correct in the first half of the generation. Anyone knows how to fix this? Thanks!

Edit: Also here's my ft command: CUDA_VISIBLE_DEVICES=7 python finetune.py > finetune_llama7_singlegpu_old_transformers.txt --base_model 'yahma/llama-7b-hf' --data_path 'ft-training_set/math_10k.json' --output_dir './trained_models/llama-7b-lora-math-single-gpu-old-transformers/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64