InternVL构造单图多轮对话数据的时候，每轮对话都需要加上<image>标签吗？

单图多轮推理，如果只有第一轮带了，会有下面的warning： 08/27 12:00:39 - mmengine - INFO - Iter(train) [ 10/21336] lr: 2.8189e-07 eta: 1 day, 0:19:11 time: 4.1054 data_time: 0.0126 memory: 19828 loss: 1.5446 warning: The size of tensor a (768) must match the size of tensor b (1536) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([768, 4096]), vit_embeds.shape=torch.Size([1536, 4096]) warning: The size of tensor a (3328) must match the size of tensor b (4096) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([3328, 4096]), vit_embeds.shape=torch.Size([4096, 4096]) warning: The size of tensor a (0) must match the size of tensor b (5632) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([0, 4096]), vit_embeds.shape=torch.Size([5632, 4096]) warning: The size of tensor a (2816) must match the size of tensor b (6144) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([2816, 4096]), vit_embeds.shape=torch.Size([6144, 4096]) warning: The size of tensor a (768) must match the size of tensor b (1536) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([768, 4096]), vit_embeds.shape=torch.Size([1536, 4096]) warning: The size of tensor a (2816) must match the size of tensor b (3072) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([2816, 4096]), vit_embeds.shape=torch.Size([3072, 4096]) 08/27 12:00:56 - mmengine - INFO - Iter(train) [ 20/21336] lr: 5.9487e-07 eta: 16:55:59 time: 1.6142 data_time: 0.0193 memory: 22302 loss: 1.8959 warning: The size of tensor a (0) must match the size of tensor b (1024) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([0, 4096]), vit_embeds.shape=torch.Size([1024, 4096]) warning: The size of tensor a (256) must match the size of tensor b (1024) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([256, 4096]), vit_embeds.shape=torch.Size([1024, 4096]) warning: The size of tensor a (0) must match the size of tensor b (4608) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([0, 4096]), vit_embeds.shape=torch.Size([4608, 4096]) warning: The size of tensor a (256) must match the size of tensor b (1024) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([256, 4096]), vit_embeds.shape=torch.Size([1024, 4096]) warning: The size of tensor a (0) must match the size of tensor b (3584) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([0, 4096]), vit_embeds.shape=torch.Size([3584, 4096]) warning: The size of tensor a (2816) must match the size of tensor b (3584) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([2816, 4096]), vit_embeds.shape=torch.Size([3584, 4096]) warning: The size of tensor a (0) must match the size of tensor b (1536) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([0, 4096]), vit_embeds.shape=torch.Size([1536, 4096]) 08/27 12:01:07 - mmengine - INFO - Iter(train) [ 30/21336] lr: 9.0786e-07 eta: 13:38:53 time: 1.1986 data_time: 0.0164 memory: 22315 loss: 2.0322 warning: The size of tensor a (3328) must match the size of tensor b (3584) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([3328, 4096]), vit_embeds.shape=torch.Size([3584, 4096]) warning: The size of tensor a (256) must match the size of tensor b (512) at non-singleton dimension 0, input_embeds[selected].shape=torch.Size([256, 4096]), vit_embeds.shape=torch.Size([512, 4096])

InternLM / xtuner

InternVL构造单图多轮对话数据的时候，每轮对话都需要加上<image>标签吗？ #913