Closed 2285443514 closed 7 months ago
Hello, please can you share what solved the problem of using the 224x224 CLIP model with nextchat?
I am getting this error: RuntimeError: stack expects each tensor to be equal size, but got [3, 224, 224] at entry 0 and [3, 336, 336] at entry 1
Amazing code! it seems that in config/model/nextchat.py, image_token_len is set to 576, corresponding to 336x336 clip, if i want to use a 224x224 clip, should i modify it? In other words, i trained a mm_projector.bin from a 224*224 clip, vicuna-1.5 using llava's code, how to use it in next-chat code, should i modify image_token_len in config, or other things that need to be done. Sincerely looking forward to your reply