Run LLM apps hyper fast on your local machine for fun.
python -m llama_cpp.server --model models/mistral-7b-instruct-v0.1.Q4_0.gguf
python -m llama_cpp.server --model models/mistral-7b-instruct-v0.1.Q4_0.gguf --n_gpu -1
python -m llama_cpp.server --model models/mistral-7b-instruct-v0.1.- Q4_0.gguf --n_gpu -1 --chat functionary
python -m llama_cpp.server --config_file config.json
python -m llama_cpp.server --model models/llava-v1.5-7b-Q4_K.gguf --clip_model_path models/llava-v1.5-7b-mmproj-Q4_0.gguf --n_gpu -1 --chat llava-1-5
👨🏾💻 Author: Tom Odhiambo
📅 Version: 1.x
📜 License: This project is licensed under the MIT License