Open wuwu-C opened 8 months ago
https://github.com/Meituan-AutoML/MobileVLM/blob/688fdec914810485c8766da96c63d9d2ce15f750/mobilevlm/model/mobilevlm.py#L100 According to this implementation, MobileVLM can receive multiple images at once in default, but you need to modify the dataloader to load multiple input images in a list or introduce an additional dimension.
And can I give it history conversation to acheive in-context inference