foundation-model-stack / fms-hf-tuning

🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
Apache License 2.0
22 stars 41 forks source link

Support str in target_modules for LoraConfig #37

Closed anhuong closed 7 months ago

anhuong commented 7 months ago

Request

LoraConfig can accept a List or a str for the target_modules as seen in the description below. This would be useful in order to support passing "all-linear" as an option instead of the specific attention layers.

Context

target_modules (Optional[Union[List[str], str]]) — The names of the modules to apply the adapter to. If this is specified, only the modules with the specified names will be replaced. When passing a string, a regex match will be performed. When passing a list of strings, either an exact match will be performed or it is checked if the name of the module ends with any of the passed strings. If this is specified as 'all-linear', then all linear/Conv1D modules are chosen, excluding the output layer. If this is not specified, modules will be chosen according to the model architecture. If the architecture is not known, an error will be raised — in this case, you should specify the target modules manually.

fms-hf-tuning LoraConfig currently accepts only a List: target_modules: List[str] = field(default_factory=lambda: ["q_proj", "v_proj"]). This causes it so that if one tries to pass all-linear it is interpreted as a List

Example

$ python tuning/sft_training.py --target_modules "all-linear"

# interpreted as
LoraConfig(r=8, lora_alpha=16, target_modules=['all-linear'], lora_dropout=0.05)

# subsequently gets used in SFTTrainer as
LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type='CAUSAL_LM', inference_mode=False, r=8, target_modules='all-linear', lora_alpha=16, lora_dropout=0.05, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}) 

# errors with
ValueError: Target modules {'all-linear'} not found in the base model. Please check the target modules and try again.

I tried testing setting target_modules: Union[List[str], str] = field(default_factory=lambda: ["q_proj", "v_proj"]) however "all-linear" was still interpreted as a List instead of a string. This is likely due to the command-line parsing.

Note that all-linear is supported in PEFT 1.8.0 and must be upgraded.

Acceptance criteria

  1. we should be able to support passing target_modules = 'all_linear' to fms-hf-tuning from command line as well for Lora Tuning
  2. we have tested change works with at least one model llama-7b and doesnt crash
Ssukriti commented 7 months ago

asssigned to vassilis vassiliadis

VassilisVassiliadis commented 7 months ago

sft_trainer.py uses transformers.HFArgumentParser to parse its command-line arguments into dataclass objects. We wouldn't want to change that by using a different parser just for this command-line parameter.

Therefore, I think the most straightforward approach here is the following:

The above will enable:

  1. handling --target_modules=all-linear
  2. programmatically invoking the train() method with a target_modules=None

while retaining the current behaviour for providing a list of layers for lora to attach to.