degraded performance with ollama build on MacOS

jackxwu commented 3 weeks ago

What is the issue?

i built the ollama image on MacOS from source by following the instructions 3. Rebuild ./ollama binary file instruction , the build works, but the model is not able to extract information correctly.

result on MacOS

extract prodcut information from this image, /Users/jack.wu/Desktop/breville1.png, provide product name, description, price and list price Added image '/Users/jack.wu/Desktop/breville1.png' Product: Nespresso Coffee Machine & Accessories Description: The image shows a range of Nespresso coffee machines and accessories that include the machine, capsules, and possibly related products. These items are likely displayed in an online retail environment. Price: The price information is not visible in the image provided. List Price: Similarly, there is no list price visible in the image.

do you see any number in the image? Yes, there are numbers visible in the image. The number "4.2" is seen next to a star rating graphic, which indicates that this product has a 4.2-star rating based on 240 ratings. Additionally, there is a price of "$699.95" displayed for one of the Nespresso products shown in the image.

is this a discount of the original price? what is the original price? Yes, this appears to be a discount of the original price. The original list price is "$699.95," and there's a strike-through indicating that the

result on huggingface

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

No response

jackxwu commented 3 weeks ago

server log shows the following error. the missing pre-tokenizer type, using: 'default' error seems to be a know issue

llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_K: 193 tensors llama_model_loader: - type q6_K: 33 tensors llm_load_vocab: missing pre-tokenizer type, using: 'default' llm_load_vocab:
llm_load_vocab: ****
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
llm_load_vocab: CONSIDER REGENERATING THE MODEL
llm_load_vocab: ****
llm_load_vocab:
llm_load_vocab: special tokens definition check successful ( 256/128256 ). llm_load_print_meta: format = GGUF V3 (latest)

jackxwu commented 3 weeks ago

i run the server again by following the instructions on this page, https://github.com/OpenBMB/ollama/tree/minicpm-v2.5/examples/minicpm-v2.5. i downloaded models from huggingface.

the model performs better but still missed the original price $749.95. notice the model hosted by huggingface was able to extract everything from the image, including $749.95. is this an expected behavior? that the model inference performance is better when the model runs on a GPU ?

following is copied from the command line

extract product information from this image, provide product name, description and price The product featured in the image is a "Nespresso Breville VesO850BS". It's described as a 4.2 star-rated item with 240 reviews. The current price for this product is listed as $699.95, but there's an ongoing discount of -7% off, which brings the price down to $699.95. This information suggests that it's likely a high-end coffee machine from the Nespresso brand.

extract the following information from this image, product title, current price, list price, Product Title: Nespresso Breville VesO850BS Current Price: $699.95 List Price (before discount): $699.95

jackxwu commented 3 weeks ago

models Q4_K_M, Q8_0, and F16, all exhibit the same degraded performance problem.

my guess is that the problem indicated by the following error message, has caused the degraded performance

llm_load_vocab: missing pre-tokenizer type, using: 'default' llm_load_vocab:
llm_load_vocab: ****
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
llm_load_vocab: CONSIDER REGENERATING THE MODEL
llm_load_vocab: ****
llm_load_vocab:
llm_load_vocab: special tokens definition check successful ( 256/128256 ).

jackxwu commented 3 weeks ago

is this issue related? https://huggingface.co/failspy/llama-3-70B-Instruct-abliterated-GGUF/discussions/2

OpenBMB / ollama