Closed ChristianWeyer closed 1 month ago
Hi! Same here, tried to reproduce the inference of the online demo using Llama.cpp fork, quality is much worse (using the same params). As Ollama is Llama.cpp based, I would think this two problems are the same. Tried with Q4 and FP16 gguf versions. Thanks!
@ChristianWeyer @hvico Sorry, we have found the problem. There are some inconsistencies between our python code and our original idea. Our c++ code is written according to our understanding, but because python has some differences, we can only achieve the same accuracy level by modifying the c++ code(in this commit). I have restored the accuracy in the official branch. You can use https://github.com/OpenBMB/llama.cpp/tree/prepare-PR-of-minicpm-v2.5 This branch will get the c++ accuracy closest to the python version. Regarding ollama, I promise to solve it tomorrow.
Where can we find the updated stuff @tc-mb @Cuiunbo ?
Where can we find the updated stuff @tc-mb @Cuiunbo ?
Where can we find the updated stuff @tc-mb @Cuiunbo ?
I have modified it, and you should be able to get good enough results with minicpmv2.5. I am sorry that I did not inform you in time even though I made the modification.
Where can we find the updated stuff @tc-mb @Cuiunbo ?
I have modified it, and you should be able to get good enough results with minicpmv2.5. I am sorry that I did not inform you in time even though I made the modification.
What has changed? How does the setup look like in the meantime? 🙂 Which tools and which model file?
I still like to use Ollama.
Referring to https://github.com/OpenBMB/ollama/issues/3#issuecomment-2260209553
:-)
Thanks @tc-mb!