Jaimboh / Llama.cpp-Local-OpenAI-server

This is a repository that shows you how you can create your local openai server and make an api calls just as you would do with Openai models

MIT License

2 stars 3 forks source link

readme

LLaMA

Run LLM apps hyper fast on your local machine for fun.

Startup 🚀

Git clone https://github.com/ggerganov/llama.cpp
Run the make commands:
- Mac: cd llama.cpp && make
- Windows (from here ):
  1. Download the latest fortran version of w64devkit.
  2. Extract w64devkit on your pc.
  3. Run w64devkit.exe.
  4. Use the cd command to reach the llama.cpp folder.
  5. From here you can run:
```
make
```
pip install openai 'llama-cpp-python[server]' pydantic instructor streamlit
Start the server:
- Single Model Chat
  python -m llama_cpp.server --model models/mistral-7b-instruct-v0.1.Q4_0.gguf
- Single Model Chat with GPU Offload
  python -m llama_cpp.server --model models/mistral-7b-instruct-v0.1.Q4_0.gguf --n_gpu -1
- Single Model Function Calling with GPU Offload
  python -m llama_cpp.server --model models/mistral-7b-instruct-v0.1.- Q4_0.gguf --n_gpu -1 --chat functionary
- Multiple Model Load with Config
  python -m llama_cpp.server --config_file config.json
- Multi Modal Models
  python -m llama_cpp.server --model models/llava-v1.5-7b-Q4_K.gguf --clip_model_path models/llava-v1.5-7b-mmproj-Q4_0.gguf --n_gpu -1 --chat llava-1-5

Models Used 🤖

Who, When, Why?

👨🏾‍💻 Author: Tom Odhiambo
📅 Version: 1.x
📜 License: This project is licensed under the MIT License