LlamaEdge / LlamaEdge

The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge
https://llamaedge.com/
Apache License 2.0
801 stars 69 forks source link

bug: Issue with gemma #120

Open alexcardo opened 4 months ago

alexcardo commented 4 months ago

Summary

[You]: Write me a short description of the Adidas brand

[Bot]:

^C ### Reproduction steps alex@M1 ~ % bash <(curl -sSfL 'https://code.flows.network/webhook/iwYN1SdN3AmPgR5ao5Gt/run-llm.sh') [I] This is a helper script for deploying LlamaEdge API Server on this machine. The following tasks will be done: - Download GGUF model - Install WasmEdge Runtime and the wasi-nn_ggml plugin - Download LlamaEdge API Server Upon the tasks done, an HTTP server will be started and it will serve the selected model. Please note: - All downloaded files will be stored in the current folder - The server will be listening on all network interfaces - The server will run with default settings which are not always optimal - Do not judge the quality of a model based on the results from this script - This script is only for demonstration purposes During the whole process, you can press Ctrl-C to abort the current process at any time. Press Enter to continue ... [+] Installing WasmEdge ... 1) Install the latest version of WasmEdge and wasi-nn_ggml plugin (recommended) 2) Keep the current version [+] Select a number from the list above: 1 Using Python: /opt/homebrew/bin/python3 INFO - CUDA is only supported on Linux INFO - CUDA is only supported on Linux WARNING - Experimental Option Selected: plugins WARNING - plugins option may change later INFO - Compatible with current configuration INFO - Running Uninstaller WARNING - SHELL variable not found. Using zsh as SHELL INFO - shell configuration updated INFO - Downloading WasmEdge |============================================================|100.00 %INFO - Downloaded INFO - Installing WasmEdge INFO - WasmEdge Successfully installed INFO - Downloading Plugin: wasi_nn-ggml |============================================================|100.00 %INFO - Downloaded INFO - Run: source /Users/alex/.zshenv The WasmEdge Runtime is installed in /Users/alex/.wasmedge/bin/wasmedge. [+] The most popular models at https://huggingface.co/second-state: 1) Gemma-7b-it-GGUF 2) Gemma-2b-it-GGUF 3) Llama-2-7B-Chat-GGUF 4) stablelm-2-zephyr-1.6b-GGUF 5) OpenChat-3.5-0106-GGUF 6) Yi-34B-Chat-GGUF 7) Yi-34Bx2-MoE-60B-GGUF 8) Deepseek-LLM-7B-Chat-GGUF 9) Deepseek-Coder-6.7B-Instruct-GGUF 10) Mistral-7B-Instruct-v0.2-GGUF 11) dolphin-2.6-mistral-7B-GGUF 12) Orca-2-13B-GGUF 13) TinyLlama-1.1B-Chat-v1.0-GGUF 14) SOLAR-10.7B-Instruct-v1.0-GGUF Or choose one from: https://huggingface.co/models?sort=trending&search=gguf [+] Please select a number from the list above or enter an URL: 1 [+] Downloading the selected model from https://huggingface.co/second-state/Gemma-7b-it-GGUF/resolve/main/gemma-7b-it-Q5_K_M.gguf ######################################################################### 100.0%######################################################################### 100.0% [+] Extracting prompt type: gemma-instruct [+] No reverse prompt required [+] Running mode: 1) API Server with Chatbot web app 2) CLI Chat [+] Select a number from the list above: 2 [+] Selected running mode: 2 (CLI Chat) [+] You already have llama-chat.wasm. Download the latest llama-chat.wasm? (y/n): y [+] Downloading the latest llama-chat.wasm ... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 2295k 100 2295k 0 0 1732k 0 0:00:01 0:00:01 --:--:-- 9793k [+] Will run the following command to start CLI Chat: wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-7b-it-Q5_K_M.gguf llama-chat.wasm --prompt-template gemma-instruct [+] Confirm to start CLI Chat? (y/n): y ********************* LlamaEdge ********************* [INFO] Model alias: default [INFO] Prompt context size: 512 [INFO] Number of tokens to predict: 1024 [INFO] Number of layers to run on the GPU: 100 [INFO] Batch size for prompt processing: 512 [INFO] Temperature for sampling: 0.8 [INFO] Top-p sampling (1.0 = disabled): 0.9 [INFO] Penalize repeat sequence of tokens: 1.1 [INFO] presence penalty (0.0 = disabled): 0 [INFO] frequency penalty (0.0 = disabled): 0 [INFO] Use default system prompt [INFO] Prompt template: GemmaInstruct [INFO] Log prompts: false [INFO] Log statistics: false [INFO] Log all information: false [INFO] Plugin version: b2230 (commit 89febfed) ================================== Running in interactive mode. =================================== - Press [Ctrl+C] to interject at any time. - Press [Return] to end the input. - For multi-line inputs, end each line with '\' and press [Return] to get another line. [You]: Write me a short description of the Adidas brand [Bot]: ^C alex@M1 ~ % wasmedge --dir .:. --nn-preload default:GGML:AUTO:~/ai/gemma-7b-it-Q4_K_M.gguf llama-chat.wasm -p gemma-instruct -c 512 [INFO] Model alias: default [INFO] Prompt context size: 512 [INFO] Number of tokens to predict: 1024 [INFO] Number of layers to run on the GPU: 100 [INFO] Batch size for prompt processing: 512 [INFO] Temperature for sampling: 0.8 [INFO] Top-p sampling (1.0 = disabled): 0.9 [INFO] Penalize repeat sequence of tokens: 1.1 [INFO] presence penalty (0.0 = disabled): 0 [INFO] frequency penalty (0.0 = disabled): 0 [INFO] Use default system prompt [INFO] Prompt template: GemmaInstruct [INFO] Log prompts: false [INFO] Log statistics: false [INFO] Log all information: false [2024-02-25 13:10:22.030] [error] [WASI-NN] GGML backend: Error: unable to init model. Error: "Fail to load model into wasi-nn: Backend Error: WASI-NN Backend Error: Caller module passed an invalid argument" alex@M1 ~ % wasmedge --dir .:. --nn-preload default:GGML:AUTO:~/ai/gemma-7b-it-Q4_K_M.gguf llama-chat.wasm -p gemma-instruct -c 512 [INFO] Model alias: default [INFO] Prompt context size: 512 [INFO] Number of tokens to predict: 1024 [INFO] Number of layers to run on the GPU: 100 [INFO] Batch size for prompt processing: 512 [INFO] Temperature for sampling: 0.8 [INFO] Top-p sampling (1.0 = disabled): 0.9 [INFO] Penalize repeat sequence of tokens: 1.1 [INFO] presence penalty (0.0 = disabled): 0 [INFO] frequency penalty (0.0 = disabled): 0 [INFO] Use default system prompt [INFO] Prompt template: GemmaInstruct [INFO] Log prompts: false [INFO] Log statistics: false [INFO] Log all information: false [2024-02-25 13:10:57.677] [error] [WASI-NN] GGML backend: Error: unable to init model. Error: "Fail to load model into wasi-nn: Backend Error: WASI-NN Backend Error: Caller module passed an invalid argument" alex@M1 ~ % ### Screenshots ![DESCRIPTION](LINK.png) ### Any logs you want to share for showing the specific issue _No response_ ### Model Information gemma-7b-it-Q5_K_M.gguf, gemma-7b-it-45_K_M.gguf ### Operating system information macOS Sonoma ### ARCH arm64 ### CPU Information Apple M1 ### Memory Size 8GB ### GPU Information Apple M1 ### VRAM Size 8GB
alexcardo commented 4 months ago

NO, the output was like this:

[You]: Write me a short description of the Adidas brand

[Bot]:

<unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16><unused16>

alabulei1 commented 4 months ago

Hi @alexcardo

Can you please try Gemma-2b? It seems that there was something wrong with the Gemma-7B-it GGUF model file. https://huggingface.co/google/gemma-7b-it/discussions/38#65d7b14adb51f7c160769fa1

Will let you know when the problem solved.

Below is the answer from Gemma-2b

[You]:
Write me a short description of the Adidas brand

[Bot]:
Sure, here's a description of the Adidas brand:

Adidas is a global leader in sports apparel and footwear, known for its iconic three stripes. The brand was founded in 1949 by Adolf Dassler in Germany and has since become synonymous with athletic excellence and innovation. Adidas offers a wide range of products, including running shoes, footballs, basketballs, and more, for various sports and activities. The brand is also known for its high-performance training apparel and equipment, which is worn by athletes at the Olympic Games and other major sporting events.