Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Inference should work out of the box after a full fine tune

Current behaviour

(axolotl) root@Transformers:~/axolotl# python -m axolotl.cli.inference examples/phi/phi-ft.yml --lora-model-dir="./phi-sft-out" /root/anaconda3/envs/axolotl/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2023-12-31 13:53:07,441] [INFO] [datasets.:58] [PID:17802] PyTorch version 2.0.1 available. dP dP dP 88 88 88 .d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88 88' 888bd8' 88' 88 88 88'88 88 88

.88 .d88b. 88. .88 88 88. .88 88 88 88888P8 dP'dP 88888P' dP88888P' dP dP

[2023-12-31 13:53:08,703] [WARNING] [axolotl.validate_config:250] [PID:17802] [RANK:0] `trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model. [2023-12-31 13:53:09,764] [INFO] [axolotl.normalize_config:150] [PID:17802] [RANK:0] GPU memory usage baseline: 0.000GB (+0.886GB misc) [2023-12-31 13:53:09,765] [INFO] [axolotl.common.cli.load_model_and_tokenizer:49] [PID:17802] [RANK:0] loading tokenizer... microsoft/phi-1_5 [2023-12-31 13:53:10,104] [DEBUG] [axolotl.load_tokenizer:185] [PID:17802] [RANK:0] EOS: 50256 / <|endoftext|> [2023-12-31 13:53:10,104] [DEBUG] [axolotl.load_tokenizer:186] [PID:17802] [RANK:0] BOS: 50256 / <|endoftext|> [2023-12-31 13:53:10,104] [DEBUG] [axolotl.load_tokenizer:187] [PID:17802] [RANK:0] PAD: 50256 / <|endoftext|> [2023-12-31 13:53:10,105] [DEBUG] [axolotl.load_tokenizer:188] [PID:17802] [RANK:0] UNK: 50256 / <|endoftext|> [2023-12-31 13:53:10,105] [INFO] [axolotl.load_tokenizer:193] [PID:17802] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference. [2023-12-31 13:53:10,105] [INFO] [axolotl.common.cli.load_model_and_tokenizer:51] [PID:17802] [RANK:0] loading model and (optionally) peft_config... [2023-12-31 13:53:15,476] [INFO] [axolotl.load_model:517] [PID:17802] [RANK:0] GPU memory usage after model load: 2.642GB (+0.048GB cache, +1.321GB misc)

Give me an instruction (Ctrl + D to submit): what's your name?

what's your Traceback (most recent call last): File "/root/anaconda3/envs/axolotl/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/anaconda3/envs/axolotl/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/axolotl/src/axolotl/cli/inference.py", line 36, in fire.Fire(do_cli) File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, kwargs) File "/root/axolotl/src/axolotl/cli/inference.py", line 32, in do_cli do_inference(cfg=parsed_cfg, cli_args=parsed_cli_args) File "/root/axolotl/src/axolotl/cli/init.py", line 142, in do_inference generated = model.generate( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/transformers/generation/utils.py", line 1764, in generate return self.sample( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/transformers/generation/utils.py", line 2861, in sample outputs = self( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/axolotl/src/axolotl/models/phi/modeling_phi.py", line 1048, in forward hidden_states = self.transformer( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/root/axolotl/src/axolotl/models/phi/modeling_phi.py", line 997, in forward hidden_states = layer( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/root/axolotl/src/axolotl/models/phi/modeling_phi.py", line 844, in forward attn_outputs = self.mixer( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/axolotl/src/axolotl/models/phi/modeling_phi.py", line 794, in forward attn_output = self._forward_cross_attn( TypeError: _forward_cross_attn() got an unexpected keyword argument 'cu_seqlens'

Steps to reproduce

Train a full phi fine tune based on a local dataset, no changes to config apart from eval_sample_packing: false
Run inference post training
Fail

Config yaml

base_model: microsoft/phi-1_5 model_type: PhiForCausalLM tokenizer_type: AutoTokenizer is_llama_derived_model: false trust_remote_code: true

load_in_8bit: false load_in_4bit: false strict: false

datasets:

path: dataset.json data_files: "./dataset.json" ds_type: json type: alpaca

dataset_prepared_path: val_set_size: 0.05 output_dir: ./phi-sft-out

sequence_len: 2048 sample_packing: false pad_to_sequence_len: true eval_sample_packing: false

adapter: lora_model_dir: lora_r: lora_alpha: lora_dropout: lora_target_linear: lora_fan_in_fan_out:

wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model:

gradient_accumulation_steps: 1 micro_batch_size: 1 num_epochs: 4 optimizer: adamw_torch adam_beta2: 0.95 adam_epsilon: 0.00001 max_grad_norm: 1.0 lr_scheduler: cosine learning_rate: 0.000003

train_on_inputs: false group_by_length: true bf16: true fp16: false tf32: true

gradient_checkpointing: early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: false

Possible solution

No response

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

3.9

axolotl branch-commit

main

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

OpenAccess-AI-Collective / axolotl

TypeError: _forward_cross_attn() got an unexpected keyword argument 'cu_seqlens' #1025