Meituan-AutoML / MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices
Apache License 2.0
1.04k stars 66 forks source link

Can MobileVLM v2 receive multiple images at once? #37

Open wuwu-C opened 8 months ago

wuwu-C commented 8 months ago

And can I give it history conversation to acheive in-context inference

wyddmw commented 7 months ago

https://github.com/Meituan-AutoML/MobileVLM/blob/688fdec914810485c8766da96c63d9d2ce15f750/mobilevlm/model/mobilevlm.py#L100 According to this implementation, MobileVLM can receive multiple images at once in default, but you need to modify the dataloader to load multiple input images in a list or introduce an additional dimension.