NOAA-OWP / DMOD

Distributed Model on Demand infrastructure for OWP's Model as a Service
Other
7 stars 15 forks source link

Refactor/Simplify Docker swarm/compose configurations #651

Open christophertubbs opened 2 months ago

christophertubbs commented 2 months ago

The docker swarm/compose configurations are currently more complicated than what our current use case (development) requires. As a result, we have mechanisms in place that make deployment, development, and interaction with DMOD more complicated than necessary.

Network configurations, for instance, may be partially (or even completely) removed. While forming firm network boundaries is good from a highly secure production deployment standpoint, the application is not ready for that level of detail and complexity.

If we're requiring and passing around .env files, there's also not a lot of need for giant lists of environment variables.

The current usage of certs also needs to be reduced - the only place we need certs is where the wider world enters the swarm, i.e. the proxy (maybe the request-service depending on how things get stitched together). Docker encrypts on its own, so all we're achieving is an additional layer of protection (TLS inside TLS).

The names of images also cause confusion and look rough by default. By default, you get images that look like: "127.0.0.1:5000/dmod-data-service:latest". That "127.0.0.1:5000" should be tied directly to a user by default to keep it unique. A default of "$USER/dmod-data-service:-development" is far more effective at identifying what is going on.

We can add advanced versions of the things being removed/replaced later if really needed. As it stands today, these values hurt more than help.

robertbartel commented 2 months ago

The docker swarm/compose configurations are currently more complicated than what our current use case (development) requires. As a result, we have mechanisms in place that make deployment, development, and interaction with DMOD more complicated than necessary.

There are definitely things that are complicated. There are almost certainly things that could be simplified. There is a good chance that there are some parts that are over-complicated and not really needed. But the description risks oversimplifying things.

The images are a good example. The non-registry components of the image repository names really should be modified to make them more consistent. The registry part - i.e., 127.0.0.1:5000 - is totally configurable (albeit globally), and by design. We want to bundle an optional registry with a deployment, so users don't have to worry about how to do that themself. And that's where this bundled registry will be found on the network (though even those settings can be tweaked). But if users are already separately running a Docker registry, they should be able to supply that to the deployment config instead and not bother starting the registry stack. There's got to be something at whatever registry is point-to, though. I suppose if the user has a Dockerhub account, they should be able to configure things in the .env to get image names like $USER/dmod-data-service:-development (except for the tagging, but see #661). But I don't think we should assume they have a Dockerhub account, or require them to set one up.

There ends up being a lot of nuance to different aspects to this, so I suggest we split up this larger item into individual issues where we can individually track more narrowly-focused discussions. And then we can link them back to here.

robertbartel commented 2 months ago

Also, FWIW, see #662, which should include some CLI capabilities for list images and tags that are available for worker images. We should be able to clean up how that's presented to leave out technical details like the registry portion of a repository.

christophertubbs commented 2 months ago

The top description is meant to serve as a starting point for splitting out tasks. There are four obvious ones in there - simplifying network config, simplifying the location of environment variables, simplifying certs, and fixing default names for generated images.

The registry part - i.e., 127.0.0.1:5000 - is totally configurable (albeit globally), and by design

I was wrong in that I assumed these were being stored along with the images in the local image... store(?). Now I know that's not right - you're right in that if you are using a local registry, ain't getting around having to pass in the address and port (no matter how dumb I think it is on docker's part, lol). Despite this, I still think it'd be wise to append usernames as tags to built images to maintain image uniqueness in shared environments and allow common names as values from optional parameters - i.e. go with the untagged/"official" version when told to, but default to 127.0.0.1:5000/dmod-data-service:$USER. I'm growing more and more curious about whether more informed methods are there to use tags in place of 127.0.0.1:5000/dmod-data-service:$USER, but my ability to experiment is a tad limited on my GFE.

aaraney commented 2 months ago

@christophertubbs, are you envisioning basically a compose file that has default values for any env variables and a simplified docker network topology? So, for example, just a single network that is used by all services and workers?

christophertubbs commented 2 months ago

Simplified topology (if at all, kinda), but it looks like there's an overlap between variables in the .env and variables in the docker-compose files. The configuration for request-service uses ~22 environment variables - that's a lot. Also, some of the environment variables connecting to redis are using inconsistent naming resulting in more advanced configurations, such as adding an alias to myredis's network identifier. If they're in there just to use environment variables to declare environment variables, create_env_config.sh should be used, not more complicated configurations.

aaraney commented 2 months ago

In the compose files we can set default values for all the environment variables that we can / make sense to provide defaults using the syntax ${VAR:-default}. More about that here. Does that solve most of your qualms if we create a simplified compose file that has mostly default values specified and a simplified network topology?

christophertubbs commented 2 months ago

Having a handful of environment variables defined in the docker-compose files is fine, but once we start getting past 10, we need to find another avenue for attaching them in order to cut down clutter. In the docker-compose files, if you dictate:

services:
    busy:
        image: "busybox:latest"
        command: "env"
        env_file: ".env"

All environment variables defined within that ".env" file end up as environment variables within that container. The contents of that experimental file was just:

TEST_VARIABLE="${WRES_LOG_LEVEL:-}"

(WRES_LOG_LEVEL being one of the environment variables already in my terminal)

and the output was:

[+] Running 1/1
 ✔ Container docker_test-busy-1  Recreated                                                                                                                                                                                                                                                                              0.1s 
Attaching to busy-1
busy-1  | PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
busy-1  | HOSTNAME=65f06ae5f1ae
busy-1  | TEST_VARIABLE=DEBUG
busy-1  | HOME=/root
busy-1 exited with code 0

IF a container is going to have a ton of environment variables, they need to be offloaded into another file for consistency, readability, and maintainability.

robertbartel commented 2 months ago

I think moving to manually maintained, service-specific environment config files has a good chance of not being as simple and simplifying as hoped for, and/or will sacrifice things we are relying on, especially leveraging Docker's compose file interpolation to handle defaults and missing values. That interpolation is important, because some of the values from the DMOD .env get use both here and in other places. There is real value to making sure there is one source of truth for such things, and ensuring other places reference it properly.

That said, glancing quickly, there are a several hard-coded (container-level) environment variable just written into service configurations directly. Those could just as easily be hard coded in the Python code instead. So there is definitely room for some clean-up.

It also might be worth moving services to the aforementioned individual .env-style config files while at the same time adding some DMOD tools to generate and manage such service-specific configs dynamically based on the global DMOD environment config ... and porting the main DMOD environment config from something purely .env-based to something backed by a Pydantic model class in order to facilitate making those tools (as well as other things). Though that'd be a rather large undertaking.

christophertubbs commented 2 months ago

I showed above that the effect of putting the variables in the specified .env wouldn't be any different. They would just be offloaded into another file. When attached via the env_file tag all interpolation rules are in effect when injecting environment variables into the container. I verified this and you can verify it with the instructions I gave.

robertbartel commented 2 months ago

I showed above that the effect of putting the variables in the specified .env wouldn't be any different. They would just be offloaded into another file.

Sorry, @christophertubbs, I did miss that it was actually nesting the interpolation functionality.

@aaraney will the interpolation work in this way when things are run through the deployx plugin for starting thestacks (it looks to me that Chris was using Compose instead of Swarm for that test, but if I've read that wrong let me know)?