OpenAccess-AI-Collective / axolotl

Go ahead and axolotl questions
https://openaccess-ai-collective.github.io/axolotl/
Apache License 2.0
6.83k stars 749 forks source link

TypeError: _forward_cross_attn() got an unexpected keyword argument 'cu_seqlens' #1025

Open varunmayya opened 6 months ago

varunmayya commented 6 months ago

Please check that this issue hasn't been reported before.

Expected Behavior

Inference should work out of the box after a full fine tune

Current behaviour

(axolotl) root@Transformers:~/axolotl# python -m axolotl.cli.inference examples/phi/phi-ft.yml --lora-model-dir="./phi-sft-out" /root/anaconda3/envs/axolotl/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2023-12-31 13:53:07,441] [INFO] [datasets.:58] [PID:17802] PyTorch version 2.0.1 available. dP dP dP 88 88 88 .d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88 88' 888bd8' 88' 88 88 88'88 88 88

  1. .88 .d88b. 88. .88 88 88. .88 88 88 88888P8 dP'dP 88888P' dP88888P' dP dP

[2023-12-31 13:53:08,703] [WARNING] [axolotl.validate_config:250] [PID:17802] [RANK:0] trust_remote_code is set to true. Please make sure that you reviewed the remote code/model. [2023-12-31 13:53:09,764] [INFO] [axolotl.normalize_config:150] [PID:17802] [RANK:0] GPU memory usage baseline: 0.000GB (+0.886GB misc) [2023-12-31 13:53:09,765] [INFO] [axolotl.common.cli.load_model_and_tokenizer:49] [PID:17802] [RANK:0] loading tokenizer... microsoft/phi-1_5 [2023-12-31 13:53:10,104] [DEBUG] [axolotl.load_tokenizer:185] [PID:17802] [RANK:0] EOS: 50256 / <|endoftext|> [2023-12-31 13:53:10,104] [DEBUG] [axolotl.load_tokenizer:186] [PID:17802] [RANK:0] BOS: 50256 / <|endoftext|> [2023-12-31 13:53:10,104] [DEBUG] [axolotl.load_tokenizer:187] [PID:17802] [RANK:0] PAD: 50256 / <|endoftext|> [2023-12-31 13:53:10,105] [DEBUG] [axolotl.load_tokenizer:188] [PID:17802] [RANK:0] UNK: 50256 / <|endoftext|> [2023-12-31 13:53:10,105] [INFO] [axolotl.load_tokenizer:193] [PID:17802] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference. [2023-12-31 13:53:10,105] [INFO] [axolotl.common.cli.load_model_and_tokenizer:51] [PID:17802] [RANK:0] loading model and (optionally) peft_config... [2023-12-31 13:53:15,476] [INFO] [axolotl.load_model:517] [PID:17802] [RANK:0] GPU memory usage after model load: 2.642GB (+0.048GB cache, +1.321GB misc)

Give me an instruction (Ctrl + D to submit): what's your name?

what's your Traceback (most recent call last): File "/root/anaconda3/envs/axolotl/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/anaconda3/envs/axolotl/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/root/axolotl/src/axolotl/cli/inference.py", line 36, in fire.Fire(do_cli) File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, kwargs) File "/root/axolotl/src/axolotl/cli/inference.py", line 32, in do_cli do_inference(cfg=parsed_cfg, cli_args=parsed_cli_args) File "/root/axolotl/src/axolotl/cli/init.py", line 142, in do_inference generated = model.generate( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/transformers/generation/utils.py", line 1764, in generate return self.sample( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/transformers/generation/utils.py", line 2861, in sample outputs = self( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/axolotl/src/axolotl/models/phi/modeling_phi.py", line 1048, in forward hidden_states = self.transformer( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/root/axolotl/src/axolotl/models/phi/modeling_phi.py", line 997, in forward hidden_states = layer( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/root/axolotl/src/axolotl/models/phi/modeling_phi.py", line 844, in forward attn_outputs = self.mixer( File "/root/anaconda3/envs/axolotl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/root/axolotl/src/axolotl/models/phi/modeling_phi.py", line 794, in forward attn_output = self._forward_cross_attn( TypeError: _forward_cross_attn() got an unexpected keyword argument 'cu_seqlens'

Steps to reproduce

  1. Train a full phi fine tune based on a local dataset, no changes to config apart from eval_sample_packing: false
  2. Run inference post training
  3. Fail

Config yaml

base_model: microsoft/phi-1_5 model_type: PhiForCausalLM tokenizer_type: AutoTokenizer is_llama_derived_model: false trust_remote_code: true

load_in_8bit: false load_in_4bit: false strict: false

datasets:

dataset_prepared_path: val_set_size: 0.05 output_dir: ./phi-sft-out

sequence_len: 2048 sample_packing: false pad_to_sequence_len: true eval_sample_packing: false

adapter: lora_model_dir: lora_r: lora_alpha: lora_dropout: lora_target_linear: lora_fan_in_fan_out:

wandb_project: wandb_entity: wandb_watch: wandb_name: wandb_log_model:

gradient_accumulation_steps: 1 micro_batch_size: 1 num_epochs: 4 optimizer: adamw_torch adam_beta2: 0.95 adam_epsilon: 0.00001 max_grad_norm: 1.0 lr_scheduler: cosine learning_rate: 0.000003

train_on_inputs: false group_by_length: true bf16: true fp16: false tf32: true

gradient_checkpointing: early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: false

warmup_steps: 100 evals_per_epoch: 4 saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.1 fsdp: fsdp_config: resize_token_embeddings_to_32x: true special_tokens: bos_token: "<|endoftext|>" eos_token: "<|endoftext|>" unk_token: "<|endoftext|>" pad_token: "<|endoftext|>"

Possible solution

No response

Which Operating Systems are you using?

Python Version

3.9

axolotl branch-commit

main

Acknowledgements

NanoCode012 commented 3 months ago

Hello, it's been a while and phi got updated. Do you still get this issue with a new model? Does it error with base phi?