Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.
https://llamafile.ai
Other
16.75k stars 830 forks source link

Add explanation for Windows user to how to Create EXE files #419

Open fabiomatricardi opened 1 month ago

fabiomatricardi commented 1 month ago

Discussed in https://github.com/Mozilla-Ocho/llamafile/discussions/418

Originally posted by **fabiomatricardi** May 15, 2024 Ciao, I tried to ask in the Discord channel but I get no replies... so after 1 week of struggles I understood how to do it. I would like this to be in the main page of the Repo ### How to create .exe files in Windows - download a GGUF smaller than 4Gb (in my example qwen1_5-0_5b-chat-q8_0.gguf, from official Qwen repo: it has already chat template and tokenizer included in the GGUF) - download the zip file for llamafile latest release [here](https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.4/llamafile-0.8.4.zip) and unzip in the same folder of the GGUF - > rename the extension to `.exe` - download zipalign from [here](https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.4/zipalign-0.8.4) and unzip it in the same folder - > rename the extension to `.exe` In my case i want the executable to run the API server with few more arguments (context length) Create a `.arg` file as explained in [Creating Llamafiles](https://github.com/Mozilla-Ocho/llamafile#creating-llamafiles) the file will contain ``` -m qwen1_5-0_5b-chat-q8_0.gguf --host 0.0.0.0 -c 12000 ... ``` in the terminal run the following to have the base binary ``` copy .\llamafile-0.8.4.exe qwen1_5-0_5b-chat-q8.llamafile ``` Then club together with zipalign the llamafile, the GGUF file and the arguments ``` .\zipalign.exe -j0 qwen1_5-0_5b-chat-q8.llamafile qwen1_5-0_5b-chat-q8_0.gguf .args ``` Finally rename the `.llamafile` into `.exe` ``` ren qwen1_5-0_5b-chat-q8.llamafile qwen1_5-0_5b-chat-q8.exe ``` ### Run the Qwen model from the terminal run ``` .\qwen1_5-0_5b-chat-q8.exe --nobrowser ``` This will load the model and start the webserver without opening the browser. ### Python API call ``` from openai import OpenAI import sys # Point to the local server client = OpenAI(base_url="http://localhost:8080/v1", api_key="not-needed") history = [ {"role": "system", "content": "You are QWEN05, an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful. Always reply in the language of the instructions."}, {"role": "user", "content": "Hello, introduce yourself to someone opening this program for the first time. Be concise."}, ] print("\033[92;1m") while True: userinput = "" completion = client.chat.completions.create( model="local-model", # this field is currently unused messages=history, temperature=0.3, frequency_penalty = 1.4, max_tokens = 600, stream=True, ) new_message = {"role": "assistant", "content": ""} for chunk in completion: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) new_message["content"] += chunk.choices[0].delta.content history.append(new_message) print("\033[1;30m") #dark grey print("Enter your text (end input with Ctrl+D on Unix or Ctrl+Z on Windows) - type quit! to exit the chatroom:") print("\033[91;1m") #red lines = sys.stdin.readlines() for line in lines: userinput += line + "\n" if "quit!" in lines[0].lower(): print("\033[0mBYE BYE!") break history = [ {"role": "system", "content": "You are an intelligent assistant. You always provide well-reasoned answers that are both correct and helpful."}, ] history.append({"role": "user", "content": userinput}) print("\033[92;1m") ``` Accepting multi line entries in the input: when finished Ctrl+Z and Enter To exit type quit! and Ctrl+Z and Enter
mofosyne commented 1 month ago

Are you able to make a PR proposal with your proposed changes so it can be reviewed and potentially merged in?

fabiomatricardi commented 1 month ago

sure... how?

mofosyne commented 1 month ago

@jart is this actually more suitable for a wiki instead? If so then could you enable it? This might be more of a freeform doc than a formal instruction.

Otherwise, would it make more sense for him to make a new folder /docs/ in the repo?

@fabiomatricardi how experience are you to making github contributions? If not very experienced then we could try to accommodate, but do recommend you learn a bit how to use github Pull Request so your contributions can be more easily tracked.

fabiomatricardi commented 1 month ago

Hi Brian, I have never done it before, but I can try. I only have maintained my own repositories. Just let me know if I have to go by creating a new folder like llamafile/docs/windowswiki or a wiki instead. In case I have to go for a pull request let me know if I have to do it on a fork, or you will give me write permissions to the repo

waiting to hear from you

Matricardi Fabio

On Thu, May 23, 2024 at 6:29 PM Brian @.***> wrote:

@jart https://github.com/jart is this actually more suitable for a wiki instead? If so then could you enable it? This might be more of a freeform doc than a formal instruction.

Otherwise, would it make more sense for him to make a new folder /docs/ in the repo?

@fabiomatricardi https://github.com/fabiomatricardi how experience are you to making github contributions? If not very experienced then we could try to accommodate, but do recommend you learn a bit how to use github Pull Request so your contributions can be more easily tracked.

— Reply to this email directly, view it on GitHub https://github.com/Mozilla-Ocho/llamafile/issues/419#issuecomment-2126766778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFPXVL3O5T4OB35FQLN3O6TZDXAILAVCNFSM6AAAAABHX2FOZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRWG43DMNZXHA . You are receiving this because you were mentioned.Message ID: @.***>