LLaVA-VL / LLaVA-NeXT

Apache License 2.0
2.52k stars 186 forks source link

The number of tokens occupied by the image encoding #68

Open zhang2514yuchi opened 3 months ago

zhang2514yuchi commented 3 months ago

When debugging, I found that the shape of cur_image_features is torch.Size([1177, 1024]) I want to confirm whether it means that one image encoding occupies 1177 tokens. Is this fixed?