Falcon-7B finetuning errors with the example config

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

I'm testing out the falcon-7B finetuning example with the config file examples/falcon/config-7b-qlora.yml as is.

Current behaviour

As suggested in the README, I ran the command line accelerate launch -m axolotl.cli.train examples/falcon/config-7b-qlora.yml

It first errors out with the following error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 43, in <module>
    fire.Fire(do_cli)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(   
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 26, in do_cli
    parsed_cfg = load_cfg(config, **kwargs)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/__init__.py", line 290, in load_cfg
    validate_config(cfg)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/config.py", line 349, in validate_config
    raise ValueError(
ValueError: ``early_stopping_patience`` requires save_steps and eval_steps to be set. eval_steps should evenly divide save_steps.
Traceback (most recent call last):
  File "/home/radhachitta/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

After unsetting early_stopping_patience as early_stopping_patience: this is the error

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 43, in <module>
    fire.Fire(do_cli)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(   
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 38, in do_cli
    dataset_meta = load_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/__init__.py", line 310, in load_datasets
    tokenizer = load_tokenizer(cfg)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 178, in load_tokenizer
    raise ValueError(
ValueError: Please set lora_modules_to_save to `embed_tokens`, `lm_head` when using an adapter and changing the special tokens.
Traceback (most recent call last):
  File "/home/radhachitta/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python3.10', '-m', 'axolotl.cli.train', 'examples/falcon/config-7b-qlora.yml']' returned non-zero exit status 1.

Finally after setting the lora_modules_to_save as lora_modules_to_save: embed_tokens, lm_head, this is the error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 43, in <module>
    fire.Fire(do_cli)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(   
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 39, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/train.py", line 65, in train
    model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 634, in load_model
    model, lora_config = load_adapter(model, cfg, cfg.adapter)
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 670, in load_adapter
    return load_lora(model, cfg, inference=inference)  
  File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 756, in load_lora
    model = get_peft_model(model, lora_config)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/mapping.py", line 133, in get_peft_model
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/peft_model.py", line 1041, in __init__
    super().__init__(model, peft_config, adapter_name) 
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/peft_model.py", line 123, in __init__
    self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 119, in __init__
    super().__init__(model, config, adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 95, in __init__
    self.inject_adapter(self.model, adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 233, in inject_adapter
    new_module = ModulesToSaveWrapper(target, adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/utils/other.py", line 177, in __init__
    self.update(adapter_name)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/utils/other.py", line 200, in update
    self.modules_to_save[adapter_name].requires_grad_(True)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2440, in requires_grad_
    p.requires_grad_(requires_grad)
RuntimeError: only Tensors of floating point dtype can require gradients
Traceback (most recent call last):
  File "/home/radhachitta/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python3.10', '-m', 'axolotl.cli.train', 'examples/falcon/config-7b-qlora.yml']' returned non-zero exit status 1.

Steps to reproduce

I ran the command accelerate launch -m axolotl.cli.train examples/falcon/config-7b-qlora.yml

with the changes to the yaml as described above

Config yaml

This is the final config-7b-qlora.yaml which results in the last error

# 1b: tiiuae/falcon-rw-1b
# 40b: tiiuae/falcon-40b
base_model: tiiuae/falcon-7b
# required by falcon custom model code: https://huggingface.co/tiiuae/falcon-7b/tree/main
trust_remote_code: false
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_falcon_derived_model: true
load_in_8bit: false
# enable 4bit for QLoRA
load_in_4bit: true
gptq: false
strict: false
push_dataset_to_hub:
datasets:
  - path: QingyiSi/Alpaca-CoT
    data_files:
      - Chain-of-Thought/formatted_cot_data/gsm8k_train.json
    type: "alpaca:chat"
dataset_prepared_path:
val_set_size: 0.05
# enable QLoRA
adapter: qlora
lora_model_dir:
sequence_len: 2048
max_packed_sequence_len:

# hyperparameters from QLoRA paper Appendix B.2
# "We find hyperparameters to be largely robust across datasets"
lora_r: 64
lora_alpha: 16
# 0.1 for models up to 13B
# 0.05 for 33B and 65B models
lora_dropout: 0.05
# add LoRA modules on all linear layers of the base model
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
output_dir: ./qlora-out
# QLoRA paper Table 9
# - 16 for 7b & 13b
# - 32 for 33b, 64 for 64b
# Max size tested on A6000
# - 7b: 40
# - 40b: 4
# decrease if OOM, increase for max VRAM utilization
micro_batch_size: 1
gradient_accumulation_steps: 2
num_epochs: 4
# Optimizer for QLoRA
optimizer: paged_adamw_32bit
torchdistx_path:
lr_scheduler: cosine
# QLoRA paper Table 9
# - 2e-4 for 7b & 13b
# - 1e-4 for 33b & 64b
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: true
gradient_checkpointing: true
# stop training after this many evaluation losses have increased in a row
# https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
#early_stopping_patience: 3
early_stopping_patience:
lora_modules_to_save: embed_tokens, lm_head
resume_from_checkpoint:
auto_resume_from_checkpoints: true
local_rank:
logging_steps: 1
xformers_attention: true
flash_attention:
gptq_groupsize:
gptq_model_v1:
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.000001
fsdp:
fsdp_config:
special_tokens:
  pad_token: "<|endoftext|>"
  bos_token: ">>ABSTRACT<<"
  eos_token: "<|endoftext|>"

Possible solution

No response

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

3.10.12

axolotl branch-commit

main v0.3.0

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

axolotl-ai-cloud / axolotl