Why the output of siglip+resampler is different between torch and llama.cpp？

OpenBMB / llama.cpp

Port of Facebook's LLaMA model in C/C++

MIT License

61 stars 12 forks source link

Why the output of siglip+resampler is different between torch and llama.cpp？ #18

Closed Xwmiss closed 1 month ago

Xwmiss commented 1 month ago

Hi, thanks for your work firstly. while I find that I use the same picture as input of torch_model and llama.cppmodel, the res(1, 96, 4096) of siglip+resampler part is different, the Cosine Similarity is only 0.75. I am confused now.

Xwmiss commented 1 month ago

Even I change the input_raw in the function "encode_image_with_clip" to make the input for torch and llama.cpp is must same, but the result of these two inference way is still different and the Cosine Similarity is still 0.75.

Xwmiss commented 1 month ago

And the another thing is need to say, while the result of siglip+resampler is different, but the final llm result for the image and question is right, which make me confused more....

tc-mb commented 1 month ago

@all Hi, I don't always pay attention to the issue area in this fork code repo.

If this issue still needs to be answered, please raise an issue in the main repo with "llamacpp" label. I will respond very quickly.
If this issue no longer needs to be answered, I will close it this week.