Ollama model does have significantly lower quality in answering than online demo

ChristianWeyer commented 1 month ago

What is the issue?

I created an Ollama model (for the fp16 GGUF) based on this: https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5

When testing one of my sample forms images, I get bad/wrong results when running the model locally via Ollama.

./ollama run minicpm-v2.5
>>> How is the Ending Balance? ./credit-card-statement.jpg
Added image './credit-card-statement.jpg'
The Ending Balance is 8,010.

I get the perfect and correct answers when using the same forms image in the online demo: https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5

What can we do to get the same quality here locally?

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

Latest git commit (367ec3f)

tc-mb commented 1 month ago

What is the issue?

I created an Ollama model (for the fp16 GGUF) based on this: https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5

When testing one of my sample forms images, I get bad/wrong results when running the model locally via Ollama.
./ollama run minicpm-v2.5
>>> How is the Ending Balance? ./credit-card-statement.jpg
Added image './credit-card-statement.jpg'
The Ending Balance is 8,010.
I get the perfect and correct answers when using the same forms image in the online demo: https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5

What can we do to get the same quality here locally?

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

Latest git commit (367ec3f)

I will test it and reply to you as soon as possible.

rhchenxm commented 1 month ago

感谢这么快推出支持ollama的版本！在macos中测试后，遇到同样的问题，测试多张图片，本地模型幻觉太严重，几乎不可用，不知是哪儿出问题了。

ChristianWeyer commented 1 month ago

感谢这么快推出支持ollama的版本！在macos中测试后，遇到同样的问题，测试多张图片，本地模型幻觉太严重，几乎不可用，不知是哪儿出问题了。

If we can get this model to run locally with the same quality as the online demo, this will be killer!

tc-mb commented 1 month ago

What is the issue?

I created an Ollama model (for the fp16 GGUF) based on this: https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5

When testing one of my sample forms images, I get bad/wrong results when running the model locally via Ollama.
./ollama run minicpm-v2.5
>>> How is the Ending Balance? ./credit-card-statement.jpg
Added image './credit-card-statement.jpg'
The Ending Balance is 8,010.
I get the perfect and correct answers when using the same forms image in the online demo: https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5

What can we do to get the same quality here locally?

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

Latest git commit (367ec3f)

There was an image-encoded bug in the previous code, and now it's the latest model result, and it looks okay.

ChristianWeyer commented 1 month ago

Nice. Do I need to download the updated GGUF @tc-mb ?

tc-mb commented 1 month ago

Nice. Do I need to download the updated GGUF @tc-mb ?

No. I haven't changed gguf.

ChristianWeyer commented 1 month ago

Nice. Do I need to download the updated GGUF @tc-mb ?

No. I haven't changed gguf.

OK, what exactly do I have to do now to test the changes? Thx!

tc-mb commented 1 month ago

Nice. Do I need to download the updated GGUF @tc-mb ?

No. I haven't changed gguf.

OK, what exactly do I have to do now to test the changes? Thx!

I think re-follow the readme such as pull code, rebuild, change modelfile, and run ollama. Remember to modify the "modelfile", the input order of the model has been changed. If any questions, feel free to ask me. I will reply ASAP.

ChristianWeyer commented 1 month ago

@tc-mb OK, I pulled the latest commit. Built, tested...

It is much better, but still not correct. Hmm...

I also tested with other forms images, and it was better, but always did get it wrong in the end.

tc-mb commented 1 month ago

感谢这么快推出支持ollama的版本！在macos中测试后，遇到同样的问题，测试多张图片，本地模型幻觉太严重，几乎不可用，不知是哪儿出问题了。

很抱歉，之前的代码在图像编码上有个bug，我已经进行了修改。下面是我在你图片上面再次截图进行测试的结果，看起来还不太差，但仍然有瑕疵。

另外之前只上传了ggml-model-Q4_K_M.gguf一个量化版本，我们还在上传更多的gguf精度版本。对于llama.cpp上导出的量化模型的精度评测需要c++重写，在之前我们没有像python版本全部评测每个数据集，我们将继续验证llama.cpp导出的量化模型的性能，这将会在之后放出，用于社区选用合适的gguf版本。

d223e674-bfde-4bed-b841-8fe5d0420cdd

ChristianWeyer commented 1 month ago

@tc-mb So, I think, I do have the latest code now.

Are you saying that I need a new F16 GGUF from https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main - and just have to wait until it has been uploaded...?

tc-mb commented 1 month ago

So, I think, I do have the latest code now.

And are you saying that I need a new F16 GGUF from https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main - and just have to wait...?

You can continue to use the previous gguf. We haven't updated the previous precision version.

Because there are 20-30 different precision versions in llama.cpp, we have only exported 2 versions before. Someone mentioned that needs more versions, and we are still uploading it one after another.

ChristianWeyer commented 1 month ago

So, I think, I do have the latest code now. And are you saying that I need a new F16 GGUF from https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main - and just have to wait...?

You can continue to use the previous gguf. We haven't updated the previous precision version.

Because there are 20-30 different precision versions in llama.cpp, we have only exported 2 versions before. Someone mentioned that needs more versions, and we are still uploading it one after another.

OK, cool.

So, any ideas what might still be wrong that I do not get the correct results?

tc-mb commented 1 month ago

So, I think, I do have the latest code now. And are you saying that I need a new F16 GGUF from https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5-gguf/tree/main - and just have to wait...?

You can continue to use the previous gguf. We haven't updated the previous precision version. Because there are 20-30 different precision versions in llama.cpp, we have only exported 2 versions before. Someone mentioned that needs more versions, and we are still uploading it one after another.

OK, cool.

So, any ideas what might still be wrong that I do not get the correct results?

I think we can look at it from two points first.

1 、git branch If it's convenient, could you check the branches and versions of ollama and llama.cpp you're using? Because ollama and llama.cpp are dependent, but if the version is wrong, it can still run, but the accuracy will be seriously lost. You can use this command to view the git branch under the current folder and see where the latest head is. git log -oneline Here are the results on my side, which you can confirm.

ollama: It should be here: your_ollama_dir 7ddcf36b-ed5f-42f2-a01c-1c90c5ec810a llama.cpp: It should be here: your_ollama_dir/llm/llama.cpp a0b84a19-3175-46e1-bf34-84add04b2f47

2、original image Maybe it's the difference in the image, or you can send the original image, and I'll see if it's different from the screenshot I used.

ChristianWeyer commented 1 month ago

git log -oneline

Same for me:

All the images I am testing have a wrong result. I can of course send you the images. Where and how?

ChristianWeyer commented 1 month ago

Something is still quite wrong @tc-mb:

The demo instance (https://huggingface.co/spaces/openbmb/MiniCPM-Llama3-V-2_5) got this all right.

ChristianWeyer commented 1 month ago

I am happy to help to make MiniCPM-Llama3-V-2_5] the best local VLM on earth @tc-mb 🙂. What further kind of tests could we do to improve the quality of the results?

rhchenxm commented 1 month ago

谢谢这么快推出支持ollama的版本！在macos中后，遇到同样的问题，多张图片，本地模型幻觉太严重，几乎不可用，不知是哪里出了问题。

很抱歉，之前代码在编码上有个 bug，我已经进行了修改。下面是我在你图片上面再次截图进行测试的结果，看起来还不太好，但仍然有缺陷。

另外在只上传了ggml-model-Q4_K_M.gguf一个量化版本，我们还可以在上传更多的gguf精度版本。对于llama.cpp上导出的量化模型的精度测评需要c++重写，在之前我还有很多像python版本一样的测评每个数据集，我们将继续验证llama.cpp导出的量化模型的性能，将会在之后放出，用于社区选用合适的gguf版本。

感谢你们的快速响应和改进程序！刚才重新拉取了所有代码，按提示重新编译和运行了程序。看前面的回复说GGUF的文件没有更新，没有重新下载GGUF，用的是我前两天自己从f16量化的Q8_0的GGUF文件以及上次的mmproj文件。用上次的文件测试，有所改进，但和线上的版本还有差距，不知道是不是因为Q8量化的原因？

Fertony commented 1 month ago

确实发现了，使用ollama本地运行minicpm-v模型，效果比demo差了很多，不确定是转化gguf模型损失了结构还是图像编码的问题，准备在本地使用原生模型运行一下看看效果

Fertony commented 1 month ago

更新：使用原生的模型本地运行demo，运行良好。在使用llama.cpp或者ollama运行都出现了较为严重的幻觉或者结果错误。

tc-mb commented 1 month ago

更新：使用原生的模型本地运行demo，运行良好。在使用llama.cpp或者ollama运行都出现了较为严重的幻觉或者结果错误。

可以方便发下两个fork代码的分支和使用的modelfile不，我确认一下。

Fertony commented 1 month ago

更新：使用原生的模型本地运行demo，运行良好。在使用llama.cpp或者ollama运行都出现了较为严重的幻觉或者结果错误。

可以方便发下两个fork代码的分支和使用的modelfile不，我确认一下。

modelfile如下： FROM ../MiniCPM-V-2_5/model/ggml-model-Q4_K_M.gguf FROM ../MiniCPM-V-2_5/mmproj-model-f16.gguf

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

另外补充一下，在minicpm-v2.5分支的ollama中，重新build docker，build过程没有问题，但docker启动后运行通过modelfile创建的模型会卡住，显存被占用，nvitop检查发现显卡没有被使用。运行llama3模型，工作正常。

tc-mb commented 1 month ago

更新：使用原生的模型本地运行demo，运行良好。在使用llama.cpp或者ollama运行都出现了较为严重的幻觉或者结果错误。

可以方便发下两个fork代码的分支和使用的modelfile不，我确认一下。

(venv) root@DESKTOP-MEGRI2B:/mnt/h/ollama# git branch

minicpm-v2.5

因为之前的代码有个地方没对齐，导致图像vision有点bug，我后来改进的代码。不知道你方便看下下面的命令行么。 git log -oneline 或者pull最新的代码。不过如果你之前确认跑的是最新的代码的话，可以把一些图发在issue里，我会在我本地试一下。

Fertony commented 1 month ago

更新：使用原生的模型本地运行demo，运行良好。在使用llama.cpp或者ollama运行都出现了较为严重的幻觉或者结果错误。

可以方便发下两个fork代码的分支和使用的modelfile不，我确认一下。

(venv) root@DESKTOP-MEGRI2B:/mnt/h/ollama# git branch

minicpm-v2.5

因为之前的代码有个地方没对齐，导致图像vision有点bug，我后来改进的代码。不知道你方便看下下面的命令行么。 git log -oneline 或者pull最新的代码。不过如果你之前确认跑的是最新的代码的话，可以把一些图发在issue里，我会在我本地试一下。

Fertony commented 1 month ago

分别是demo运行的和用ollama运行的

tc-mb commented 1 month ago

分别是demo运行的和用ollama运行的

可能我没太理解，这个问题在哪？ matcha coconut granola crisp这四个词应该就是抹茶、椰子、燕麦、脆。似乎是回答正确的？只是和demo上回答的不一样？

tc-mb commented 1 month ago

git log -oneline

Same for me:

All the images I am testing have a wrong result. I can of course send you the images. Where and how?

sorry, I was sorting out the code yesterday. PR was mentioned to llama.cpp officials a few hours ago. Maybe you can send the picture to issue, or send it to my email "caitinachi@modelbest.cn".

tc-mb commented 1 month ago

I am happy to help to make MiniCPM-Llama3-V-2_5] the best local VLM on earth @tc-mb 🙂. What further kind of tests could we do to improve the quality of the results?

I'm not sure if the streaming feature in ollama conflicts with the way I write it, but it takes longer to find out.

You can try to ask questions separately in the following ways to see if there is a problem with each answer.

./ollama create minicpmv -f openbmb/Modelfile

. /ollama run minicpmv "{your question}" {image_path}

For example: . /ollama run minicpmv "How is the Ending Balance?" / Users/a0/Pictures/20240528-012828.jpeg

ChristianWeyer commented 1 month ago

git log -oneline

Same for me: All the images I am testing have a wrong result. I can of course send you the images. Where and how?

sorry, I was sorting out the code yesterday. PR was mentioned to llama.cpp officials a few hours ago. Maybe you can send the picture to issue, or send it to my email "caitinachi@modelbest.cn".

Here we go :-).

credit-card-statement

tc-mb commented 1 month ago

I am happy to help to make MiniCPM-Llama3-V-2_5] the best local VLM on earth @tc-mb 🙂. What further kind of tests could we do to improve the quality of the results?

I'm not sure if the streaming feature in ollama conflicts with the way I write it, but it takes longer to find out. You can try to ask questions separately in the following ways to see if there is a problem with each answer. . / ollama run test "{your question}" {image_path}. For example: . / ollama run test "How is the Ending Balance?" / Users/a0/Pictures/20240528-012828.jpeg

Ah, OK. Where does the model name go in the command...?

OK, I modified the above reply. A command to create the environment has been added, which should be easy for you to view.

ChristianWeyer commented 1 month ago

I am happy to help to make MiniCPM-Llama3-V-2_5] the best local VLM on earth @tc-mb 🙂. What further kind of tests could we do to improve the quality of the results?

I'm not sure if the streaming feature in ollama conflicts with the way I write it, but it takes longer to find out.

You can try to ask questions separately in the following ways to see if there is a problem with each answer.

./ollama create minicpmv -f openbmb/Modelfile

. /ollama run minicpmv "{your question}" {image_path}

For example: . /ollama run minicpmv "How is the Ending Balance?" / Users/a0/Pictures/20240528-012828.jpeg

Uh, this is completely nuts...

./ollama run minicpm-v2.5:latest "How is the Ending Balance" "./credit-card-statement.jpg"
Added image './credit-card-statement.jpg'
The Ending  value is 0.

tc-mb commented 1 month ago

I am happy to help to make MiniCPM-Llama3-V-2_5] the best local VLM on earth @tc-mb 🙂. What further kind of tests could we do to improve the quality of the results?

I'm not sure if the streaming feature in ollama conflicts with the way I write it, but it takes longer to find out. You can try to ask questions separately in the following ways to see if there is a problem with each answer. ./ollama create minicpmv -f openbmb/Modelfile . /ollama run minicpmv "{your question}" {image_path} For example: . /ollama run minicpmv "How is the Ending Balance?" / Users/a0/Pictures/20240528-012828.jpeg

Uh, this is completely nuts...

./ollama run minicpm-v2.5:latest "How is the Ending Balance" "./credit-card-statement.jpg" Added image './credit-card-statement.jpg' The Ending value is 0.

The image path is not enclosed in double quotes. This is the format defined by ollama.

ChristianWeyer commented 1 month ago

The image path is not enclosed in double quotes. This is the format defined by ollama.

OK.

The result is completely wrong :-).

./ollama run minicpm-v2.5:latest "How is the Ending Balance" ./credit-card-statement.jpg
Added image './credit-card-statement.jpg'
3,448.10

ChristianWeyer commented 3 weeks ago

... do we have any idea where to go from here @tc-mb ? Thanks!

OpenBMB / ollama