geopython / GeoHealthCheck

Service Status and QoS Checker for OGC Web Services
https://geohealthcheck.org
MIT License
84 stars 71 forks source link

Differences between config_site.py and config_main.py #370

Open fsteggink opened 3 years ago

fsteggink commented 3 years ago

Hi, I've noticed a couple of differences between the files docker/config_site.py and GeoHealthCheck/config_main.py. The latter is containing the variables SQLALCHEMY_TRACK_MODIFICATIONS, GHC_METADATA_CACHE_SECS and GHC_PROBE_DEFAULTS, which are missing in the former. It might be a good idea to add them to the Docker config as well.

As an additional note, config_main.py is using arrays for the variables GHC_PLUGINS and GHC_USER_PLUGINS, but when reading them as env. vars in config_site.py, the values will be strings. That doesn't seem correct to me.

justb4 commented 3 years ago

GHC config is eventually a combination of GeoHealthCheck/config_main.py, instance/config_site.py and an optional GHC_SETTINGS var (which can be a file path). Config values are overridden in that order, so if e.g. a value is not present in config_site.py, it is taken from config_main.py. See also the documentation and the init.py code. One reason is to enable a minimal configuration with plenty of defaults. But yes for completeness may specify all.

GHC_PLUGINS and GHC_USER_PLUGINS: any format: comma-separated string, list set can be specified. The init.py code will resolve this. Also historic reasons: Docker and ENV vars came in later in the project.

fsteggink commented 3 years ago

Yes, I've read about the various configuration options, and eventually figured out that by specifying a custom file through GHC_SETTINGS I can also override things like probe defaults. However I have no idea if the missing variables in config_site.py (docker) is on purpose or not. I see no reason not to specify GHC_METADATA_CACHE_SECS through an env file for Docker.

Regarding GHC_PLUGINS / GHC_USER_PLUGINS, the documentation currently says the following:

GHC_PLUGINS: list of Core/built-in Plugin classes or modules available on installation

GHC_USER_PLUGINS: list of Plugin classes or modules provided by user (you)

It is not obvious that this should be a comma separated string.

fsteggink commented 3 years ago

Another thing that surprised me (and I didn't try to find the cause of it yet) is that the file docker/compose/ghc.env doesn't try to specify all different config variables. Yet, docker/config_site.py uses os.environ['VARNAME'] all over the place, instead of os.environ.get('VARNAME'), and it doesn't seem that GHC has any problems with it. GHC starts properly, and in the logging I don't see any possible errors related to missing env vars or KeyErrors like in this example:

>>> import os
>>> os.environ['GHC_DOESNOTEXIST']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Apps\Python37\lib\os.py", line 681, in __getitem__
    raise KeyError(key) from None
KeyError: 'GHC_DOESNOTEXIST'
justb4 commented 3 years ago

Forgot to add this (was in Gitter):
Correction: ENV defaults are set in the Dockerfile, so when not specified during deployment, those values are taken, hence os.environ.get('NAME', default) was not required.

fsteggink commented 3 years ago

Yes, that's right. I didn't think about any env vars set in the Dockerfile (or by some other means in the Docker container).

justb4 commented 3 years ago

But yes, there are now too many places to look for understanding the GHC configuration approach. So it used to be only config_main.py with all defaults to be overridden in config_site.py. With Docker/K8s the "env-var-strategy" was introduced. With again defaults needed, but now in the Dockerfile. This excludes the use of env-vars in regular/non-Docker deployments. Consequently, introducing a new config-var requires updates in quite some places (about 5, including docs).

Rethinking this could be enhanced:

Only problem is that this will affect many existing deployments, unless a very smart upgrade strategy is in place.

fsteggink commented 3 years ago

Although I'm not familiar with the GHC code, my first impression is that your suggestion looks good and is doable, without breaking changes.

The order settings are read in init.py:

        app.config.from_pyfile('config_main.py')
        app.config.from_pyfile('../instance/config_site.py')
        app.config.from_envvar('GHC_SETTINGS', silent=True)

"Traditional" GHC users likely use config_main.py and don't use config_site.py and likely have no need to use GHC_SETTINGS, right? So, I'd opt to leave config_main.py in place. That also means that this should be the single place where all defaults are specified, as suggested by you.

Config_site.py is maily used by Docker users. Since it's newer, it is probably less used. Config_site.py does not need to override all settings from config_main. In fact it doesn't, which is actually the reason I submitted this issue. So, what it should do in my opinion is to individually check the env vars, and only change the settings when the env var actually exists. So, using os.environ.get everywhere, and not just for GHC_PLUGINS and GHC_USER_PLUGINS. There is no need to use any of these env vars in the Dockerfile. In order to know what the defaults are, the documentation and perhaps Dockerfile and docker-compose.yml should be updated that one should look for the defaults in config_main.py.

There are a few more complex settings, like GHC_SMTP. If you only specify a few of the variables (like server, port, username, and password) and rely on some defaults (tls and ssl), there are some defaults which still need to be set. Is it viable to import config_main.py into config_site.py, in order to provide default values in these cases? Eventually, in that case you can also do something like os.environ.get('GHC_MYSETTING', config_main.GHC_MYSETTING), but that seems redundant in most cases.

What do you think of this approach?