Open ReflectionL opened 1 month ago
In the VSP-LLM GitHub project, batch training and inference have not yet been implemented. To facilitate the implementation of batch training, we recommend that padding for instructions, visual features, and labels be aligned to the left side of each llm_input. Additionally, an attention mask corresponding to this llm_input should be incorporated into the LLM.
For instance: instruction=[x, x, pad] visual feature=[x, x, pad] labels=[x, x, x, pad, pad] -> llm_input=[pad, pad, pad, pad, x, x, x, x, x, x, x ]
By implementing this process, you can efficiently train the VSP-LLM model using batches.
in the training process, I found that if I set batch size > 1, the loss sometimes will be nan
and some logits also nan
I checked the padding of labels and features, that's all ok. And if I set the batch size to 1, this problem won't happen. So I think this may not an issue caused by quantization. NEED some HELP plz