PygmalionAI / aphrodite-engine

Large-scale LLM inference engine
https://aphrodite.pygmalion.chat
GNU Affero General Public License v3.0
926 stars 100 forks source link

[Usage]: start aphrodite in docker with tensor parallel #562

Closed kulievvitaly closed 1 week ago

kulievvitaly commented 1 month ago

Your current environment

I have server with 4x3090ti. I can run llama 3 70b with vllm in docker with command: sudo docker run --shm-size=32g --log-opt max-size=10m --log-opt max-file=1 --rm -it --gpus '"device=0,1,2,3"' -p 9000:8000 --mount type=bind,source=/home/me/.cache,target=/root/.cache vllm/vllm-openai:v0.5.3.post1 --model casperhansen/llama-3-70b-instruct-awq --tensor-parallel-size 4 --dtype half --gpu-memory-utilization 0.92 -q awq

I tried multiple attempts to start aphrodite in docker with tensor-parallel. Non-standard argument names and insufficient documentation lead to errors and strange behavior. Please add an example of how to run aphrodite with llama 3 70b model and with exl2 quantization on 4 gpus.

How would you like to use Aphrodite?

I want to run this bullerwins/Meta-Llama-3.1-70B-Instruct-exl2_6.0bpw. I don't know how to integrate it with Aphrodite.

AlpinDale commented 1 month ago

We don't directly take command-line arguments in the docker launch command. You will have to supply them as environment variables - please see the .env file in the docker directory for examples. Multi-GPU is also included.

But you're right, our docker documentation is very much lacking. Next update has some docker overhauls. I will make sure to update the wiki.

AlpinDale commented 1 week ago

v0.6.0 has changed the docker to take arguments directly as CLI args. Please see the docker section in the documentation, or the snippet in the readme.