SHI-Labs / CuMo

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
Apache License 2.0
117 stars 8 forks source link

What the size of the input image should be of CuMo? #2

Closed leoozy closed 1 month ago

chrisjuniorli commented 1 month ago

the input image can be any size and the data loader will resize it to 336x336 and then send it to CLIP.

leoozy commented 1 month ago

Thanks.

leoozy commented 1 month ago

image Hello, I noticed that in paper you implement the multi-resolution input. Does this implement have this feature? Thanks!

chrisjuniorli commented 1 month ago

Yes, check the https://github.com/SHI-Labs/CuMo/blob/main/cumo/model/multimodal_encoder/clip_encoder.py for the implementation details.

leoozy commented 1 month ago

Thanks!