ORPO seems broken with `micro_batch_size` or `eval_batch_size` > 1

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

It should run without an error, as it does when you have micro_batch_size and eval_batch_size set to 1.

Current behaviour

Returns two errors;

ValueError: expected sequence of length 406 at dim 1 (got 75)

ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (rejected_input_ids in this case) have excessive nesting (inputs type list where type int is expected).

Traceback (most recent call last):
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 759, in convert_to_tensors
    tensor = as_tensor(value)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 721, in as_tensor
    return torch.tensor(value)
ValueError: expected sequence of length 406 at dim 1 (got 75)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/train.py", line 160, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/trainer.py", line 1780, in train
    return inner_training_loop(
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/trainer.py", line 2085, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/accelerate/data_loader.py", line 451, in __iter__
    current_batch = next(dataloader_iter)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in __next__
    data = self._next_data()
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/monkeypatch/data/batch_dataset_fetcher.py", line 32, in fetch
    return self.collate_fn(data)
  File "/media/xzuyn/NVMe/LClones/axolotl/src/axolotl/utils/collators.py", line 106, in __call__
    features = self.tokenizer.pad(
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3369, in pad
    return BatchEncoding(batch_outputs, tensor_type=return_tensors)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 224, in __init__
    self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
  File "/media/xzuyn/NVMe/LClones/axolotl/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 775, in convert_to_tensors
    raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`rejected_input_ids` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

Steps to reproduce

Run the YAML provided, which has a micro_batch_size and eval_batch_size of 2.

I tested:

micro_batch_size: 1 & eval_batch_size: 1 - Works micro_batch_size: 2 & eval_batch_size: 2 - Errors micro_batch_size: 2 & eval_batch_size: 1 - Errors micro_batch_size: 1 & eval_batch_size: 2 - Errors

Config yaml

wandb_project: MV02-7B
wandb_entity:
wandb_watch:
wandb_name: ORPO-QLoRA-run_1-Test-1
wandb_log_model:

output_dir: ./MV02-Test-1-run_1-ORPO-7B-QLoRA
resume_from_checkpoint:
save_steps: 10
saves_per_epoch:
save_safetensors: true
save_total_limit: 5
hub_model_id:
hub_strategy:

base_model: alpindale/Mistral-7B-v0.2-hf
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_llama_derived_model: false
is_mistral_derived_model: true
is_falcon_derived_model: false
is_qwen_derived_model: false

bf16: true
fp16: false
tf32: false

load_in_8bit: false
load_in_4bit: true
strict: false

sequence_len: 4096
s2_attention: false
sample_packing: false
pad_to_sequence_len: false
train_on_inputs: false
group_by_length: false

adapter: qlora
lora_model_dir:
lora_r: 64
lora_alpha: 64
lora_dropout: 0.125
lora_fan_in_fan_out:
lora_target_linear:
save_embedding_layers:
peft_layers_to_transform:
peft_use_dora:
peft_use_rslora: true
peft_layer_replication:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj
lora_modules_to_save:

unfrozen_parameters:

rl: orpo
orpo_alpha: 0.1
remove_unused_columns: false
chat_template: chatml
datasets:
  - path: argilla/ultrafeedback-binarized-preferences-cleaned
    type: orpo.chat_template
val_set_size: 0.01
eval_sample_packing: false
evaluation_strategy: steps
eval_steps: 10
evals_per_epoch:
test_datasets:
dataset_prepared_path: ./Test-1-seed42
push_dataset_to_hub:
hf_use_auth_token:
shuffle_merged_datasets: true

num_epochs: 1
gradient_accumulation_steps: 8
micro_batch_size: 2
eval_batch_size: 2
warmup_steps: 0
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.00001
loraplus_lr_ratio: 8
loraplus_lr_embedding:
cosine_min_lr_ratio:
weight_decay: 0.01
max_grad_norm: 1.0
logging_steps: 1

gradient_checkpointing: true
early_stopping_patience: false
local_rank:
xformers_attention: false
flash_attention: false
sdp_attention: true

loss_watchdog_threshold: 100.0
loss_watchdog_patience: 3

debug: true
seed: 42
deepspeed:
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

Possible solution

No response

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

3.10.12

axolotl branch-commit

main/bda48f0

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

axolotl-ai-cloud / axolotl

ORPO seems broken with `micro_batch_size` or `eval_batch_size` > 1 #1489