Fix Tensor Dimension Mismatch in Padding Operation for Batch Processing

Previously, pad_embeds was incorrectly constructed by repeating the pad_embed tensor along the wrong dimension, leading to a size mismatch when attempting to concatenate it with inputs['inputs_embeds']. The error message is as follows：

Process Process-1:
Traceback (most recent call last):
  File "/ML-A100/team/mm/shuyu/anaconda3/envs/intern_clean/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/ML-A100/team/mm/shuyu/anaconda3/envs/intern_clean/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/ML-A100/team/mm/shuyu/workspace/projects/InternLM-XComposer/cap_train.py", line 159, in inferCaptionsAndSave
    inputs = torch.cat([pad_embeds, inputs['inputs_embeds']], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 57 but got size 1 for tensor number 1 in the list.
FINISHED!

Modification: specifying the dimension as 1 when preparing pad_embeds.

This issue was not triggered by the official examples due to the difference in token counts across batches is 1 .Therefore I increase the difference in token numbers between the two examples

InternLM / InternLM-XComposer

Fix Tensor Dimension Mismatch in Padding Operation for Batch Processing #254