Fine tuning and --evaluation_strategy argument

I'm trying to get fine-tuning working through the 3_sft.sh script but am encountering an error:

Traceback (most recent call last):
  File "/root/VILA/llava/train/train_mem.py", line 36, in <module>
    train()
  File "/root/VILA/llava/train/train.py", line 436, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1854, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2738, in training_step
    loss = self.compute_loss(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2761, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1735, in forward
Traceback (most recent call last):
    loss = self.module(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/root/VILA/llava/model/language_model/llava_llama.py", line 133, in forward
    outputs = self.llm.forward(
TypeError: LlamaForCausalLM.forward() got an unexpected keyword argument 'seqlens_in_batch'

I tried commenting out the seqlens_in_batch argument where self.llm.forward() is called and the script will work, but when i try to get the validation scores by setting --evaluation_strategy to something other than "no" I get a bunch of errors related to the dataloader and the dataset 'inputs':

Traceback (most recent call last):
  File "/root/VILA/llava/train/train_mem.py", line 36, in <module>
    train()
  File "/root/VILA/llava/train/train.py", line 436, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1929, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2262, in _maybe_log_save_evaluate
    dataset_metrics = self.evaluate(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3022, in evaluate
    output = eval_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3212, in evaluation_loop
    loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3429, in prediction_step
    loss, outputs = self.compute_loss(model, inputs, return_outputs=True)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2761, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/root/VILA/llava/model/language_model/llava_llama.py", line 102, in forward
    ) = self.prepare_inputs_labels_for_multimodal(
  File "/root/VILA/llava/model/llava_arch.py", line 261, in prepare_inputs_labels_for_multimodal
    if vision_tower is None or images is None or input_ids.shape[1] == 1:
IndexError: tuple index out of range

Any suggestions?

NVlabs / VILA

Fine tuning and --evaluation_strategy argument #122