Closed kulievvitaly closed 1 week ago
We don't directly take command-line arguments in the docker launch command. You will have to supply them as environment variables - please see the .env
file in the docker
directory for examples. Multi-GPU is also included.
But you're right, our docker documentation is very much lacking. Next update has some docker overhauls. I will make sure to update the wiki.
v0.6.0 has changed the docker to take arguments directly as CLI args. Please see the docker section in the documentation, or the snippet in the readme.
Your current environment
I have server with 4x3090ti. I can run llama 3 70b with vllm in docker with command:
sudo docker run --shm-size=32g --log-opt max-size=10m --log-opt max-file=1 --rm -it --gpus '"device=0,1,2,3"' -p 9000:8000 --mount type=bind,source=/home/me/.cache,target=/root/.cache vllm/vllm-openai:v0.5.3.post1 --model casperhansen/llama-3-70b-instruct-awq --tensor-parallel-size 4 --dtype half --gpu-memory-utilization 0.92 -q awq
I tried multiple attempts to start aphrodite in docker with tensor-parallel. Non-standard argument names and insufficient documentation lead to errors and strange behavior. Please add an example of how to run aphrodite with llama 3 70b model and with exl2 quantization on 4 gpus.
How would you like to use Aphrodite?
I want to run this bullerwins/Meta-Llama-3.1-70B-Instruct-exl2_6.0bpw. I don't know how to integrate it with Aphrodite.