A serverless RAG QA bot for various clients
Setup:
pip3 install -r requirements.txt
sudo docker run --gpus '"all"' --shm-size 10g -p 11434:11434 -it alpindale/aphrodite-engine
python3 -m aphrodite.endpoints.openai.api_server --model mistralai/Mixtral-8x7B-Instruct-v0.1 --kv-cache-dtype fp8_e5m2 --served-model-name mistral --max-model-len 8096 --host 0.0.0.0 --port 11434
uvicorn api:app --host 0.0.0.0 --port 1337 --reload
curl -X POST http://localhost:1337/ask -H "Content-Type: application/json" -d '{"question": "What is the sky blue?"}'
nvm install 18
nvm use
npm run dev
Run Ollama
cd ollama-docker
docker-compose up -d
docker-compose -f docker-compose-ollama-gpu.yaml up -d
// if you have a nvidia GPU configured
https://locahost:8000 to see whats running
https://localhost:11434 ollama service
https://localhost:3000 to see the UI
to download the model
docker ps
shows all docker instances running
enter the ollama container with: docker exec -it ollama bash
ollama pull mistral
ADD THIS TO THE DOCKER FILE
ollama run mistral
TO TEST ITS WORKING
TODO: Pull Mistrall automatically at build of the container