[Question] 多图collate_fn

PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

https://arxiv.org/abs/2401.15947

Apache License 2.0

1.9k stars 121 forks source link

Open PangziZhang523 opened 4 months ago

PangziZhang523 commented 4 months ago

这里假设batchsize是6，将图和video都写到new_images里，new_images的shape是[45,3,224,224],那怎么知道哪个图片对应哪个conversation呢？求解答？