Closed HPPinata closed 1 month ago
The worker also doesn't cope well with the sigterm/sigkill commands docker sends. I tried extending the timeout to as much as 90 sec. and it didn't even register it was supposed to stop. Any ideas why that would be @tazlin?
The worker also doesn't cope well with the sigterm/sigkill commands docker sends. I tried extending the timeout to as much as 90 sec. and it didn't even register it was supposed to stop. Any ideas why that would be @tazlin?
I'm not sure to be honest. I would ordinarily suggest to people using the docker images for applications such as on a cloud host to first put their worker in maintenance and wait for the worker's current queue to clear. The fact is that even in the "normal" situation of the worker running directly on a bare-metal machine is that it could theoretically take 10 minutes to shut down, as that is the approximate worst-case possible job expiry time.
I'm not sure whether the compose.cuda.yaml file works, this needs testing by someone else.
I can confirm the cuda compose works as intended, though I stumbled onto a landmine doing so. I have put a cautionary note in the readme as a result.
Yeah, I failed to consider wanting to mount external files/dirs into the container from other locations. But with that fixed I think this is ready. I tested the ROCm version (it was similar to what I was doing locally anyway), so I don't expect any issues there either.
Does $AIWORKER_CACHE_HOME
, etc. just pick up what is exported in your shell or is it necessary to reference (and document) a .env
file in the compose.yaml
?
Yeah, I failed to consider wanting to mount external files/dirs into the container from other locations. But with that fixed I think this is ready. I tested the ROCm version (it was similar to what I was doing locally anyway), so I don't expect any issues there either. Does
$AIWORKER_CACHE_HOME
, etc. just pick up what is exported in your shell or is it necessary to reference (and document) a.env
file in thecompose.yaml
?
AIWORKER_CACHE_HOME
is the canonical and long standing way of setting the worker models directory. That environment variable is relied on in numerous places throughout various horde libraries. It was previously referenced elsewhere in the context of docker. Per your recommendation, I moved that information to the new readme and added a link to the old documentation linking to Dockerfiles/README.md
so the information is centralized in one place. This information I think addressed the root of your concern.
I'm just happy there's something useful to do even without much programming knowledge. I'll probably look into modifying the compose file so it too properly picks up the .env file (when present), maybe even automatically switching between mounting (or not mounting) bridgeData.yaml
and referencing .env
but that'll need some more testing (I haven't looked into how compose reacts when instructed to reference a non-existent file).
The current version should work as long as the env variables are exported in the shell and should also serve as decent bases/templates for anyone wanting to create more individualized configs.
As for AMD: I'm just not happy with Nvidia having effectively no competition. Since I don't have any critical applications I can at least test and report back with what AMD do implement, so maybe, over a few years the situation might improve.
Add support for docker compose for easier setup, updates, etc. The sparse clone/checkout is nice to have an easily updatable setup without all of the unneccesarry
origin/main
clutter. Mounting in the models directory is also sensible for both performance (gigantic volume mounts can be slow) and not having to re download everything after a container is recreated.I'm not sure whether the
compose.cuda.yaml
file works, this needs testing by someone else.