Support str in target_modules for LoraConfig

anhuong commented 7 months ago

Request

LoraConfig can accept a List or a str for the target_modules as seen in the description below. This would be useful in order to support passing "all-linear" as an option instead of the specific attention layers.

Context

target_modules (Optional[Union[List[str], str]]) — The names of the modules to apply the adapter to. If this is specified, only the modules with the specified names will be replaced. When passing a string, a regex match will be performed. When passing a list of strings, either an exact match will be performed or it is checked if the name of the module ends with any of the passed strings. If this is specified as 'all-linear', then all linear/Conv1D modules are chosen, excluding the output layer. If this is not specified, modules will be chosen according to the model architecture. If the architecture is not known, an error will be raised — in this case, you should specify the target modules manually.

fms-hf-tuning LoraConfig currently accepts only a List: target_modules: List[str] = field(default_factory=lambda: ["q_proj", "v_proj"]). This causes it so that if one tries to pass all-linear it is interpreted as a List

Example

$ python tuning/sft_training.py --target_modules "all-linear"

# interpreted as
LoraConfig(r=8, lora_alpha=16, target_modules=['all-linear'], lora_dropout=0.05)

# subsequently gets used in SFTTrainer as
LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type='CAUSAL_LM', inference_mode=False, r=8, target_modules='all-linear', lora_alpha=16, lora_dropout=0.05, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}) 

# errors with
ValueError: Target modules {'all-linear'} not found in the base model. Please check the target modules and try again.

I tried testing setting target_modules: Union[List[str], str] = field(default_factory=lambda: ["q_proj", "v_proj"]) however "all-linear" was still interpreted as a List instead of a string. This is likely due to the command-line parsing.

Note that all-linear is supported in PEFT 1.8.0 and must be upgraded.

Acceptance criteria

we should be able to support passing target_modules = 'all_linear' to fms-hf-tuning from command line as well for Lora Tuning
we have tested change works with at least one model llama-7b and doesnt crash

Ssukriti commented 7 months ago

asssigned to vassilis vassiliadis

VassilisVassiliadis commented 7 months ago

sft_trainer.py uses transformers.HFArgumentParser to parse its command-line arguments into dataclass objects. We wouldn't want to change that by using a different parser just for this command-line parameter.

Therefore, I think the most straightforward approach here is the following:

[x] Update definition of LoraConfig.target_modules to match that of the target_modules in lora with a help message like the one from lora
- for backwards compatibility we'll keep the current default values of fms-hf-tuning i.e. ["q_proj", "v_proj"]
- Due to the way that HFArgumentParser constructs an ArgumentParser (https://github.com/huggingface/transformers/blob/2749e479f30ab13235b0b9b4a6bbcf4c3b29a081/src/transformers/hf_argparser.py#L206-L208) the type of target_modules must be a List[str]. Otherwise --target_modules foo bar ends up being ["foo"] instead of ["foo", "bar"]
[x] add a simple if statement in main which will convert ["all-linear"] into "all-linear" so that LORA receives a string instead of an array of strings.
- this circumvents the behaviour that Anh noticed when setting the command-line argument --target_modules=all-linear
[x] all-linear is supported in peft 0.8.0+ so we should reflect that with a python package dependency

The above will enable:

handling --target_modules=all-linear
programmatically invoking the train() method with a target_modules=None

while retaining the current behaviour for providing a list of layers for lora to attach to.

foundation-model-stack / fms-hf-tuning