Closed leoozy closed 1 month ago
Thanks.
Hello, I noticed that in paper you implement the multi-resolution input. Does this implement have this feature? Thanks!
Yes, check the https://github.com/SHI-Labs/CuMo/blob/main/cumo/model/multimodal_encoder/clip_encoder.py for the implementation details.
Thanks!
the input image can be any size and the data loader will resize it to 336x336 and then send it to CLIP.