ERDDAP / erddap

ERDDAP is a scientific data server that gives users a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps. ERDDAP is a Free and Open Source (Apache and Apache-like) Java Servlet from NOAA NMFS SWFSC Environmental Research Division (ERD).
Creative Commons Zero v1.0 Universal
84 stars 58 forks source link

allow environment variables to alter the setup #53

Closed marc-portier closed 3 years ago

marc-portier commented 3 years ago

Currently setup is driven through setup.xml (read through ResourceBundle2 from cohort) which in turn builds up the EDStatic making those settings available throughout the code.

In modern microservice and container setups (e.g. axiom/docker-erddap, specially in combination with nginx or apache reverse proxy forwarding) however a new separation of concerns is pushing up. Who creates what parts of the setup.xml is not as straightforward any more. This could be facilitated by allowing the setup to be driven by environment variables like ERDDAP.baseUrl and ERDDAP.baseHttpsUrl

These would then be read by System.getenv and injected in stead of the settings in the XML file. Given that ResourceBundle2 already functions as a decorator for the java native ResourceBundle, this extra layer could maybe added on that level?

BobSimons commented 3 years ago

Thank you for your well-thought-out suggestion.

I think of setup.xml as an interface. The admin can use any method they want to make changes to setup.xml (e.g., by hand, by script) and can even specify where setup.xml is located (see https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#erddapContentDirectory ). Then ERDDAP reads from that interface. The admin has full control of setup.xml. So part of me says: isn't that sufficient?

But I see it would be convenient to use environment variables (or system properties). Are you suggesting that any parameter in setup.xml could be overridden by an environment variable or just a select few parameters? If just a few, which ones do you propose?

Are you specifically suggesting/requesting that these environment variables have names using the form "ERDDAP.[parameterName]"? Do all operating systems support "." in environment variables? Would "_" be safer as a separator?

Can anyone think of any security issues related to this? I can't, because the environment variables shouldn't be accessible by other users (other than root), but I could easily be missing something. Some of the parameters need to be kept secret, e.g., flagKeyKey, googleClientID, and orchidClientSecret.

Any other votes for or against this?

mwengren commented 3 years ago

@BobSimons I ran into this issue when trying to run ERDDAP instances on Kubernetes a few years ago.

IIRC, I had to basically script a find/replace process to modify setup.xml on ERDDAP pod deployment using some dedicated Kubernetes pods to make changes to setup.xml parameters, which got a bit tedious. There may have been better approaches to that than what I did at the time, but this would make that deploy process much much easier.

Not sure on the security implications. Tomcat itself can typically be tuned via user or system environment variables that override config file settings, correct (CATALINA_HOME and such)?

FWIW, it's ugly this is what my ERDDAP Kubernetes orchestration code did wrt setup.xml:

https://github.com/mwengren/kuberr/blob/master/kuberr/erddap_config.py#L175-L221

Basically, took an default setup.xml file, stored it in Kubernetes as a ConfigMap, replaced values via pod environment variables assigned by the admin, and then the deployed ERDDAP pod(s) would map that file contents to their usual setup.xml file location in the Docker container.

I know ERDDAP isn't meant to be run on Kubernetes, but no doubt others will try like I did. A plain single-instance Docker deployment would benefit from these changes as well, I think.

marc-portier commented 3 years ago

@BobSimons, thx for considering this.

trying to address the questions/remarks:

interface/contract I agree on the interface statement, and see how adding environment variables might blur the clarity of that

Recreating some clarity at run-time we might consider adding logging-statements to the code: mentioning what source was taken for which property (While listing the actual value in the log should be avoided for security - see further?)

scope

Yes, I am inclined to allow this for all parameters in setup.xml

Motivation being

format

Yes, I think some kind of prefixing makes sense here.

Equally yes: _ (underscore) is a better separator than . (dot)

security issues

Can't say I know enough of the internals of ERDDAP already to make a full assessment on this.

From a principled POV though the ability to support multiple sources for injecting settings would allow for more (not less?) ways to properly shield some settings from people that should not be able to control/see them.

As such the more prominent aspect on this topic might be deciding which source of properties should take precedence over which. Current feeling: System.env value should rule over setup.xml

This would also allow typical CI/CD scenario's where a minimal/default/safe setup.xml could be shared via co-developers in a git repository, while the automated test and deploy environments might inject (secret) settings that override those in test and production.

BobSimons commented 3 years ago

Should this system use environmental variables or system properties? There are security concerns since admins will be using this system to pass secret information (e.g., flagKeyKey, googleClientID, and orchidClientSecret). The question is: once a value is set by the admin, could some bad guy (e.g., another user on the server) read the value? I don't know enough to answer that question. Comments?

Java's JavaDoc says "System properties and environment variables are both conceptually mappings between names and values. Both mechanisms can be used to pass user-defined information to a Java process. Environment variables have a more global effect, because they are visible to all descendants of the process which defines them, not just the immediate Java subprocess. They can have subtly different semantics, such as case insensitivity, on different operating systems. For these reasons, environment variables are more likely to have unintended side effects. It is best to use system properties where possible. Environment variables should be used when a global effect is desired, or when an external system interface requires an environment variable (such as PATH)."

abkfenris commented 3 years ago

We're running a few ERDDAP instances in Kubernetes and allowing setup.xml configuration with environment variables would be fantastic.

I believe @Dylan-Pugh has basically been keeping two different setup.xmls around. One gets baked into the image for deployment, and then gets mounted over for local testing with a different baseUrl. It would be much easier to manage with being able to override baseUrl with an environment variable (and less likely to have configuration drift).

Using system properties would make it much harder to use ERDDAP in different contexts as most systems don't have ways to control their configuration natively. Where-as with environment variables, we can set them locally with export, in a docker-compose.yaml configuration, or with a Kubernetes manifest, or various secret handling machinery like Vault.

As for security, while there is a tendency back away from environment variables for secret configuration right now (its easier to find secrets with an env if you take over a process than to find various secret files), if someone gains access to the ERDDAP process, then they also have access to setup.xml or system properties. I haven't looked recently at Google or Orchid oauth, but for most auth systems you can limit the sending domains. Similarly with AWS credentials, you can limit the them to read only access of specific S3 buckets, so you can have multiple security boundaries.

I believe it's more likely that setup.xml is likely to be left with overly generous permissions that would allow a neighboring user to access it, then the process orchestrating ERDDAP is likely to leak environment variables to a neighbor.

BobSimons commented 3 years ago

@abkfenris and @mwengren, thanks very much for that information. @marc-portier, thanks very much for the original request/suggestion and follow-up information.

Okay. I have made it so that any parameter in setup.xml can also be specified (with higher priority) via an environment variable named ERDDAP_parameterName . This feature will be in the next ERDDAP release.

BobSimons commented 3 years ago

@marc-portier The just-released v2.14 of ERDDAP has this feature. Please email me (bob.simons at noaa.gov) so we can talk offline.