IndexError: index out of range in self

SoshyHayami commented 1 month ago

Hi, sounds like the code doesn't work. do you have any suggestions?

I followed the exact same steps. used transformers 4.24 and also the most recent version.

torch : 2.1.2

IndexError                                Traceback (most recent call last)
Cell In[23], line 11
      1 trainer = Seq2SeqTrainer(
      2     model=model,
      3     args=training_args,
   (...)
      8 
      9 )
---> 11 trainer.train()

File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:1501, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1496     self.model_wrapped = self.model
   1498 inner_training_loop = find_executable_batch_size(
   1499     self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
   1500 )
-> 1501 return inner_training_loop(
   1502     args=args,
   1503     resume_from_checkpoint=resume_from_checkpoint,
   1504     trial=trial,
   1505     ignore_keys_for_eval=ignore_keys_for_eval,
   1506 )

File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:1749, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1747         tr_loss_step = self.training_step(model, inputs)
   1748 else:
-> 1749     tr_loss_step = self.training_step(model, inputs)
   1751 if (
   1752     args.logging_nan_inf_filter
   1753     and not is_torch_tpu_available()
   1754     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   1755 ):
   1756     # if loss is nan or inf simply add the average of previous logged losses
   1757     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:2508, in Trainer.training_step(self, model, inputs)
   2505     return loss_mb.reduce_mean().detach().to(self.args.device)
   2507 with self.compute_loss_context_manager():
-> 2508     loss = self.compute_loss(model, inputs)
   2510 if self.args.n_gpu > 1:
   2511     loss = loss.mean()  # mean() to average on multi-gpu parallel training

File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:2540, in Trainer.compute_loss(self, model, inputs, return_outputs)
   2538 else:
   2539     labels = None
-> 2540 outputs = model(**inputs)
   2541 # Save past state if it exists
   2542 # TODO: this needs to be fixed and made cleaner later.
   2543 if self.args.past_index >= 0:

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File /opt/conda/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py:1611, in T5ForConditionalGeneration.forward(self, input_ids, attention_mask, decoder_input_ids, decoder_attention_mask, head_mask, decoder_head_mask, cross_attn_head_mask, encoder_outputs, past_key_values, inputs_embeds, decoder_inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
   1608 # Encode if needed (training, first prediction pass)
   1609 if encoder_outputs is None:
   1610     # Convert encoder inputs in embeddings if needed
-> 1611     encoder_outputs = self.encoder(
   1612         input_ids=input_ids,
   1613         attention_mask=attention_mask,
   1614         inputs_embeds=inputs_embeds,
   1615         head_mask=head_mask,
   1616         output_attentions=output_attentions,
   1617         output_hidden_states=output_hidden_states,
   1618         return_dict=return_dict,
   1619     )
   1620 elif return_dict and not isinstance(encoder_outputs, BaseModelOutput):
   1621     encoder_outputs = BaseModelOutput(
   1622         last_hidden_state=encoder_outputs[0],
   1623         hidden_states=encoder_outputs[1] if len(encoder_outputs) > 1 else None,
   1624         attentions=encoder_outputs[2] if len(encoder_outputs) > 2 else None,
   1625     )

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File /opt/conda/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py:941, in T5Stack.forward(self, input_ids, attention_mask, encoder_hidden_states, encoder_attention_mask, inputs_embeds, head_mask, cross_attn_head_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    939 if inputs_embeds is None:
    940     assert self.embed_tokens is not None, "You have to initialize the model with valid token embeddings"
--> 941     inputs_embeds = self.embed_tokens(input_ids)
    943 batch_size, seq_length = input_shape
    945 # required mask seq length can be calculated via length of past

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py:162, in Embedding.forward(self, input)
    161 def forward(self, input: Tensor) -> Tensor:
--> 162     return F.embedding(
    163         input, self.weight, self.padding_idx, self.max_norm,
    164         self.norm_type, self.scale_grad_by_freq, self.sparse)

File /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py:2233, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2227     # Note [embedding_renorm set_grad_enabled]
   2228     # XXX: equivalent to
   2229     # with torch.no_grad():
   2230     #   torch.embedding_renorm_
   2231     # remove once script supports set_grad_enabled
   2232     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2233 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

IndexError: index out of range in self

IamAdiSri commented 1 month ago

Hi! Could you post the code that you're running for trimming and training? Also attach the output of pip freeze so I can inspect the environment.

Will try to have a look later today.

SoshyHayami commented 1 month ago

Damn, thanks a lot for the fast response! I already closed the kaggle env, but I can share the notebook (it's mostly based on the official hf tutorial) I won't clean it in case something might give you a clue. here:

https://www.kaggle.com/code/respair/notebooka6f94376b2 (you'll have to add the data_collator back in the trainer arguments, if the notebook is not updated yet, also you can ignore the protobuf installating cell, i never ran that cell and yet the problem persist.)

btw, have you seen this? wondering what's the difference between your work and that

SoshyHayami commented 1 month ago

Damn, thanks a lot for the fast response! I already closed the kaggle env, but I can share the notebook (it's mostly based on the official hf tutorial) I won't clean it in case something might give you a clue. here:

https://www.kaggle.com/code/respair/notebooka6f94376b2 (you'll have to add the data_collator back in the trainer arguments, if the notebook is not updated yet, also you can ignore the protobuf installating cell, i never ran that cell and yet the problem persist.)

btw, have you seen this? wondering what's the difference between your work and that

Getting the same error even with that repo, while in both cases if I use the original models it works just fine. are you sure training with a trimmed vocab is even possible?

IamAdiSri commented 1 month ago

Sorry, I'm running busy till next week and won't be able to get to this immediately.

But yes, training is definitely possible. I finetuned all the trimmed models when I was working on my graduate thesis, and it worked back then. I'm guessing there's been some changes to the code on huggingface's end which is why it's failing.

Meanwhile, you could try using transformers=4.17.0 instead of the latest and see if that helps.

IamAdiSri / hf-trim

IndexError: index out of range in self | Issue #5