HDFGroup / hsds

Cloud-native, service based access to HDF data
https://www.hdfgroup.org/solutions/hdf-kita/
Apache License 2.0
129 stars 53 forks source link

Use config file for settings #50

Closed jreadey closed 4 years ago

jreadey commented 4 years ago

Rather than relying on environment variables, use config file for settings. This would be mounted (for docker) or loaded as a secret (Kubernetes).

jreadey commented 4 years ago

Changes to support config files are checked into jreadey-master. Posix, docker, and Kubernetes should all be supported.

If anyone can try out this branch before I merge into master, that would be appreciated.

s004pmg commented 4 years ago

Commit looks reasonable to me, I'll try to get some burn in time for this on DCOS.

s004pmg commented 4 years ago

@jreadey , I didn't expect that this would be exclusive. I.e., environment variables don't work at all as a configuration option now? I had thought that it would be either, or maybe both with a precedence of env variables.

s004pmg commented 4 years ago

Oh, I see now... The config.yml must exist in the config directory. It doesn't look like the paths align to find admin/config/config.yml in the docker build at the moment.

jreadey commented 4 years ago

@s004pmg - there are 4 levels of config overrides (from lowest precedence to highest):

  1. value in config.yml
  2. value in override.yml
  3. environment variable
  4. command line switch

The problem with environment variables with Docker or Kubernetes is that they need to be explicitly passed in the docker-compose script or k8s yaml config. As the number of config keys increased, this got to be a bit tedious. So I've put most of the config option in config.yml and removed most of the ones in the yaml.

For kubernetes the config.yml is passed to the pods in a ConfigMap. I'm not exactly what the equivalent would be for DCOS.

Could you take a look at the changes in basenode.py? I made some changes in DCOS related code here, but don't have the ability to test it.

I'll check on the docker build now. This is for docker-compose.posix.yml?

s004pmg commented 4 years ago

I think that environment variables can be more common in DCOS, but either way works there.

I think I was too terse in my last message, here's the stack running a Docker container built off of master:

Traceback (most recent call last): File "/usr/local/bin/hsds-datanode", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/site-packages/hsds/datanode.py", line 230, in main log.info("datanode start") File "/usr/local/lib/python3.8/site-packages/hsds/hsds_logger.py", line 29, in info if config.get("log_level") not in ("ERROR", "WARNING", "WARN"): File "/usr/local/lib/python3.8/site-packages/hsds/config.py", line 111, in get _load_cfg() File "/usr/local/lib/python3.8/site-packages/hsds/config.py", line 30, in _load_cfg raise FileNotFoundError(msg) FileNotFoundError: unable to find config file

So that's a new stack I got by swapping in a new build. My point is that if we're going to still allow folks to primarily configure via environment variables, then the docker build should produce a stock config of defaults at that location. We shouldn't force them to mount in a blank config if they prefer to configure via environment variables.

jreadey commented 4 years ago

Ok - got it. Try out with this change: https://github.com/HDFGroup/hsds/commit/5b7a0f1fa1cd67d1fa789099f6bde89a183ca1ba. If /config/config.yml is not found, the server will pull from /etc/config/config.yml (part of the docker image)

s004pmg commented 4 years ago

That gets the nodes starting, but now on to a new problem, looks like the node["host"] isn't getting set, so the cluster flails and doesn't self-organize.

I.e., this code in the headnode healthcheck fires: if node["host"] is None: fail_count += 1 log.warn("Node found with missing host information.") continue

s004pmg commented 4 years ago

When I dump the node JSON, I get this: {'node_number': 0, 'node_type': 'dn', 'host': None, 'port': None, 'id': None} It's been a while since I've been in there, but I don't remember that being a valid node definition.

s004pmg commented 4 years ago

We've been testing successfully using commit 532565f7bed8a5c5a966d8419d48d25acebc1363 for a few days now, I have no further concerns.

jreadey commented 4 years ago

Are you still getting the strange state with node JSON?

s004pmg commented 4 years ago

Well, yes, though it gets past it. I get these in the head node for a while at start up:

WARN> Node found with missing host information.

Then it seems to go away after several minutes and the cluster finally turns ready (probably after several nodes turn over and restart). It's probably still worth debugging more because it may be delaying cluster startup, however it does settle in now.

jreadey commented 4 years ago

I'll close this issue now. If anyone has questions/bugs with the config file usage, feel free to re-open