Yifan-Song793 / ETO

Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
https://arxiv.org/abs/2403.02502
88 stars 9 forks source link

Performance with LoRA Finetuning #1

Closed Yu-Fangxu closed 6 months ago

Yu-Fangxu commented 6 months ago

Hi, Thanks for your wonderful work, I noticed that you fine-tuned LLMs with 8 A100 GPUs. Have you ever tried training with LoRA for less consumption of computational resources? Thanks~

Yifan-Song793 commented 6 months ago

Hi, Fangxu! Thanks for the question! Our experiments were conducted in a fully-parameter fine-tuning setting. In fact, 4 A100 80G GPUs are enough for our 7B experiments, including SFT and DPO. To implement LoRA in your training, you will need to modify fastchat/train/train.py and fastchat/train/train_dpo.py. Maybe you can see fastchat/train/train_lora.py for the reference implementation of integrating LoRA with FastChat.