Fix+feat: docker compose

StefanDanielSchwarz commented 4 months ago

This fixes and improves the Docker files:

The old entrypoint, ENTRYPOINT [ "/app/aphrodite-engine/entrypoint.sh" ], was never used because there's no /app directory as the Dockerfile uses /workspace/aphrodite-engine instead. And if it were used, it would be too limiting since it only supports some settings and the only way to add more would be to override it anyway.

There's a more elegant and flexible way, using Docker Compose's environment variables in the command directive, which works better and makes an entrypoint script unnecessary. That's why I've reworked the Composefile to fix the issues and provide improved features.

There's also an accompanying .env file which has all the options commented out. It's not strictly required since you could prefix them to the docker compose call like this: PORT=7860 MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2 docker compose up -d – but it's a helpful example and makes changing and keeping track of settings easier, and there's no need to edit the Composefile itself.

New is the envvar CMD_ADDITIONAL_ARGUMENTS which can take any number of additional arguments that haven't been covered by the existing envvars. For example, to fix seed to 42, set it like this: CMD_ADDITIONAL_ARGUMENTS="--seed 42" – in the .env file or as docker compose prefix.

Another very useful feature is support for SSL/TLS certs: Just point the envvars SSL_CERTFILE and SSL_KEYFILE to your certs and Aphrodite will mount and use them. No need to edit the Composefile, and the mounts only apply when the vars are set.

Finally some additional little tweaks, like supporting HUGGING_FACE_HUB_TOKEN, shared memory replaced with interprocess communication, and using the host's localtime and timezone.

To show an actual example of how I run Mixtral with Aphrodite, I've included that as env.example. It's a working configuration for Mixtral 8x7B AWQ with 32K context on a 48 GB GPU.

Hope this is useful to you and others. Aphrodite is excellent and hopefully the improved Docker files will make it easier to run, helping it gain more of the popularity it deserves.

AlpinDale commented 4 months ago

Thanks for this! I had a local branch where I fixed most of these issues but I never got around to finalising and committing it. This approach looks better. I'll review in a bit.

AlpinDale commented 3 months ago

Left a few comments. Additionally, do you think it's possible (in a follow-up PR), to create a GH workflow for automatically building a docker image (and upload to docker hub) for every commit or tagged release?

StefanDanielSchwarz commented 3 months ago

Sure, I'll take care of the points you mentioned. And auto-building a docker image would definitely be great, so I'll look into that after this is done.

I also have some questions as I'm not sure what's your intended setup here:

There's the built-in default port 2242 and then there's also port 7860, and I often see APIs on 5000 (at least that's SillyTavern's default). So which of the three ports do you want to use by default? We could also use different ports depending on the endpoint, like 5000 for OpenAI or 7860 for Kobold, if their standard ports differ. Just let me know which port you want for consistency.

Same with paths: I've seen /app/aphrodite-engine, /workspace/aphrodite-engine, and many projects just put their files directly into /app. So which location should we standardize on? Right now it's /workspace/aphrodite-engine in the last published image.

AlpinDale commented 3 months ago

For docker, we may want to do 7860 for OpenAI and 5000 for Kobold (official Kobold is also 5000 by default). I'll trust your judgement for the default paths.

StefanDanielSchwarz commented 3 months ago

Alright, here's a reworked and much-improved version:

Endpoint is now configurable, with OpenAI being the default.
Port is 5000 by default. Made the most sense to me, as that's what Kobold uses and also ooba's API. (Haven't seen 7860 as an API port yet, only for UIs, and it's probably easier if there's only one port and users can change it anyway.)
Entrypoint is back! Since Kobold doesn't support all the options of vLLM (no API keys, no SSL certs), I had to bring it back to work with that logic.
Dockerfile refactored. Also uses /app/aphrodite-engine as its workdir, just like Dockerfile.rocm. Seems more appropriate to me that way.
Non-root by default. It's better for security to avoid a root user, so I've switched to UID 1000, and made it configurable. Besides improved security that also avoids new files being owned by root inside the user's cache folder if that's mounted from the host. To make this work no matter the user ID, downloaded files are now stored in the container's /tmp directory.
Volume and bind mount are now both supported. By default a volume will be used for /tmp (where downloaded models are now stored) so files won't have to be redownloaded every time the container is recreated. But if you uncomment HF_CACHE, it will use your own ~/.cache/huggingface from the host, so your download cache is shared between host user and container, saving disk space since there won't be duplicated model downloads.
Timezone is now configured through an env var instead of mounting host files, making it more compatible with different host environments.
Please note that these changes require a rebuild of the image!

All in all it should be more compatible this way, no matter if the container is run from docker compose or directly without a compose file, and no matter if it's as root or as a normal user.

AlpinDale commented 3 months ago

LGTM. If you don't have anything else to add, I'll merge now.

StefanDanielSchwarz commented 3 months ago

Some fixes, and I've replaced the build-essential package (which is actually intended for building Debian packages, so it includes a lot of stuff we don't need for Python apps) with just the parts we need.

LGTM

I was in the middle of building the new image when my server went down with a hardware issue. :( Did you build the image and run it to see if everything works? I tested with the entrypoint mounted into the existing image and that worked for me, but with all the changes, a little bit of testing would be useful.

AlpinDale commented 3 months ago

Tried building it (before the latest commit here). Builds successfully, but running it gives me this error:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "$HOME/docker/entrypoint.sh": stat $HOME/docker/entrypoint.sh: no such file or directory: unknown.
ERRO[0000] error waiting for container:

Steps to reproduce:

docker build -t aphrodite-engine .
docker run aphrodite-engine:latest

StefanDanielSchwarz commented 3 months ago

Tried to build it on my workstation, but after almost half an hour, WSL just crashed. Damn, I'm having terrible luck building images today! Will have to troubleshoot this indirectly.

The problem is that the entrypoint doesn't substitute the $HOME variable. The Dockerfile reference explains that entrypoints support variable substitution, but only when not using the exec form (which we use). Since the shell form has other drawbacks, I'll just specify the path without a variable, that will fix the issue.

StefanDanielSchwarz commented 3 months ago

LGTM now: Entrypoint correctly specified and made executable. Image will need to be rebuilt, but this should work now. 🤞

StefanDanielSchwarz commented 3 months ago

By the way, the crashes of my server and workstation were actually caused by the Docker build causing OOM - a known issue of vLLM: OOM during Docker build · Issue #1782 · vllm-project/vllm

What are your system specs? Are you doing anything special to make it build successfully?

AlpinDale commented 3 months ago

You may need to limit the maximum number of build processes with MAX_JOBS=n. Ninja expects you to have 8GB of RAM per thread.

StefanDanielSchwarz commented 3 months ago

Made some changes to the SSL paths. Problem was that with a non-root user, the key wouldn't be readable in the default location, so now the cert and key are mounted into the workdir where they can be read properly.

Also tried to build the image again, using these commands:

export MAX_JOBS=4
export NVCC_THREADS=2
docker build . -t aphrodite-engine --build-arg max_jobs=4 --build-arg nvcc_threads=2

Ran for four hours until I killed it. WSL seems to be not very suitable to build such a big image. Or did I set the variables incorrectly?

StefanDanielSchwarz commented 3 months ago

Found and squished another little bug (when ENDPOINT was unset, it wouldn't use the OpenAI-specific properties like API key or SSL certs)

AlpinDale commented 3 months ago

Sorry I've been away for a bit, I'll test again soon. Thanks for the hard work.

AlpinDale commented 3 months ago

Sorry for the delay, @StefanDanielSchwarz, I was on a break.

I got around to testing it just now. Builds fine, but I'm having trouble launching the image (with both docker run and docker compose). With docker run, it seems as if the .env file isn't being sourced properly, and with compose, the image immediately exits with code 0. Maybe I'm doing it wrong?

StefanDanielSchwarz commented 3 months ago

Welcome back and thanks for looking at it now!

Strange that. Docker run doesn't use the .env file, but by default that's all commented-out anyway.

Is Docker not logging anything? I'd comment out env_file, ipc and user lines to see if it starts correctly then.

Are you on a Linux, Windows or Mac host? Is the entrypoint executed (I'd add an echo debug statement at the top to see if that's actually reached)?

AlpinDale commented 3 months ago

Issue might've been the entrypoint script, unsure. It seems to work now. I'll merge this for now, since the existing dockerfile is broken anyway. If this has any issues, we can fix in a follow-up PR. Thanks for the contribution!

StefanDanielSchwarz commented 3 months ago

@AlpinDale It's merged, but the image alpindale/aphrodite-engine:latest hasn't been updated yet, right? Because until that is done, the entrypoint inside the container is still the old one, and it won't work properly.

StefanDanielSchwarz commented 3 months ago

@AlpinDale bump Any progress?

PygmalionAI / aphrodite-engine

Fix+feat: docker compose #264