OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
https://optimalscale.github.io/LMFlow/
Apache License 2.0
8.11k stars 819 forks source link

Discussion about LISA #833

Open caoshuai03 opened 1 month ago

caoshuai03 commented 1 month ago

In the article, only the comparison of the average weight paradigm of each layer during lora fine-tuning is given.

  1. But what if their weights are different before fine-tuning?
  2. Using the weighted averaging method, is it possible that there are differences in the intermediate iteration process?
research4pan commented 4 weeks ago

Thanks for your interest in LMFlow and LISA!

Regarding the first question, we conducted the fine-tuning based on the same seed and the same base model, so their initial weight should be the same before fine-tuning.

As for the second question, I think the weighted averaging methods are certainly different from the normal training process. But since it is less frequently adopted in practice of fine-tuning LLMs, we didn't conduct experiments on that. To draw insights from weighted averaging methods, we think at least two parts of experiments are needed if anyone is interested in this aspect:

Hope this information can be helpful 😄