Zamba2AttentionDecoderLayer.forward() takes from 4 to 10 positional arguments but 11 were given

lucyknada commented 4 months ago

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

no error

Current behaviour

Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 72, in <module>
    fire.Fire(do_cli)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 39, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 67, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/workspace/axolotl/src/axolotl/train.py", line 187, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/workspace/axolotl/transformers_zamba2/src/transformers/trainer.py", line 1938, in train
    return inner_training_loop(
  File "/workspace/axolotl/transformers_zamba2/src/transformers/trainer.py", line 2274, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/workspace/axolotl/transformers_zamba2/src/transformers/trainer.py", line 3313, in training_step
    loss = self.compute_loss(model, inputs)
  File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 640, in compute_loss
    return super().compute_loss(model, inputs, return_outputs=return_outputs)
  File "/workspace/axolotl/transformers_zamba2/src/transformers/trainer.py", line 3358, in compute_loss
    outputs = model(**inputs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
    loss = self.module(*inputs, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/axolotl/transformers_zamba2/src/transformers/models/zamba2/modeling_zamba2.py", line 1263, in forward
    outputs = self.model(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/axolotl/transformers_zamba2/src/transformers/models/zamba2/modeling_zamba2.py", line 1053, in forward
    layer_outputs = self._gradient_checkpointing_func(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 458, in checkpoint
    ret = function(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: Zamba2AttentionDecoderLayer.forward() takes from 4 to 10 positional arguments but 11 were given

Steps to reproduce

install the zamb2 transformers:

git clone https://github.com/Zyphra/transformers_zamba2.git
cd transformers_zamba2
pip install -e .

then simply run the config on 8x a6000, it'll crash.

Config yaml

base_model: Zyphra/Zamba2-2.7B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: gardner/glaive-function-calling-v2-sharegpt
    type: sharegpt
    conversation: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
output_dir: ./outputs/zamba2

sequence_len: 1024
sample_packing: true
pad_to_sequence_len: true
trust_remote_code: true
chat_template: alpaca

wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_name: zamba2
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 5
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 5e-6

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 2
eval_table_size:
saves_per_epoch: 4
debug:
deepspeed: deepspeed_configs/zero2.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

Possible solution

No response

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

Python 3.10.14

axolotl branch-commit

c5587b45accdc50e0be7b2ec9ed3a66879d68156

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

winglian commented 4 months ago

Zamba is not officially supported in transformers. I expect simply adding a catchall **kwargs in the forward would help. see https://github.com/Zyphra/transformers_zamba2/blob/main/src/transformers/models/zamba2/modeling_zamba2.py#L786

pglorio commented 3 months ago

Thanks for pointing this out! The error is related to the trainer passing an additional positional argument, so we added *args in this commit https://github.com/Zyphra/transformers_zamba2/commit/8af28567d3e8913730243167ec5c70c1a656bbcd. Please let us know if you're having more issues with this.

axolotl-ai-cloud / axolotl

Zamba2AttentionDecoderLayer.forward() takes from 4 to 10 positional arguments but 11 were given #1799