Details on provided peft

aksh555 commented 8 months ago

Thanks for sharing your code! When replicating the results of the README, I get lower results when using the official huggingface peft library. Could you please provide some details on the changes you made to your version of peft?

HZQ950419 commented 8 months ago

Hi,

Compared to the huggingface peft library, we add the Series Adapter, and Parallel Adapter. The training setting may also be different, we use the following command to train adapters on commonsense reasoning (for math reasoning just replace the data_path):

For LoRA: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-lora-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name lora --target_modules '["q_proj", "k_proj", "v_proj", "up_proj", "down_proj"]' --lora_r 32 --lora_alpha 64

For Series Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-bottleneck-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --target_modules '["down_proj"]'

For Parallel Adapter: CUDA_VISIBLE_DEVICES=0 python finetune.py --base_model 'yahma/llama-7b-hf' --data_path 'commonsense_170k.json' --output_dir './trained_models/llama-7b-parallel-commonsense/' --batch_size 16 --micro_batch_size 4 --num_epochs 3 --learning_rate 3e-4 --cutoff_len 256 --val_set_size 120 --eval_step 80 --save_step 80 --adapter_name bottleneck --use_parallel_adapter --target_modules '["up_proj", "down_proj"]'

aksh555 commented 8 months ago

That was helpful. Thanks for the quick response!

AGI-Edgerunners / LLM-Adapters

Details on provided peft #45