PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models
https://arxiv.org/abs/2401.15947
Apache License 2.0
1.9k stars 121 forks source link

[Question] 多图collate_fn #76

Open PangziZhang523 opened 4 months ago

PangziZhang523 commented 4 months ago

Question

image 这里假设batchsize是6,将图和video都写到new_images里,new_images的shape是[45,3,224,224],那怎么知道哪个图片对应哪个conversation呢?求解答?