NExT-ChatV / NExT-Chat

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".
Apache License 2.0
217 stars 8 forks source link

Question about image_token_len in config #16

Closed 2285443514 closed 7 months ago

2285443514 commented 7 months ago

Amazing code! it seems that in config/model/nextchat.py, image_token_len is set to 576, corresponding to 336x336 clip, if i want to use a 224x224 clip, should i modify it? In other words, i trained a mm_projector.bin from a 224*224 clip, vicuna-1.5 using llava's code, how to use it in next-chat code, should i modify image_token_len in config, or other things that need to be done. Sincerely looking forward to your reply

RafeeqShodeinde commented 3 months ago

Hello, please can you share what solved the problem of using the 224x224 CLIP model with nextchat?

I am getting this error: RuntimeError: stack expects each tensor to be equal size, but got [3, 224, 224] at entry 0 and [3, 336, 336] at entry 1