Open zhang2514yuchi opened 3 months ago
When debugging, I found that the shape of cur_image_features is torch.Size([1177, 1024]) I want to confirm whether it means that one image encoding occupies 1177 tokens. Is this fixed?
When debugging, I found that the shape of cur_image_features is torch.Size([1177, 1024]) I want to confirm whether it means that one image encoding occupies 1177 tokens. Is this fixed?