driplineorg / dripline-cpp

C++ Implementation of the Dripline framework
http://driplineorg.github.io
Apache License 2.0
1 stars 0 forks source link

Replace .project8_authentications.json with more general auth scheme #34

Open buzinsky opened 4 years ago

buzinsky commented 4 years ago

Unlike the standard username/ password pairs, kubernetes uses Secrets (https://kubernetes.io/docs/concepts/configuration/secret/), for more secure authentication. Putting these secrets into the existing .project8_authentications.json file is expected to be awkward, given the conventions of the file format chosen

Reconfigure the authentication to be more general, given this particular (K8) and other possible authentication formats. One suggestion was to spread the authentication tokens over different files, marked by environmental variables of the form DRIPLINE_XXX, where a function would loop over these files, processing each one in turn.

Suggestions for implementation/ design are appreciated: so is a refinement of the problem, if the focus is off.

laroque commented 4 years ago

As written, this is probably a duplicate of #25 , though when we discussed I had something more generally useful in mind (with an eye to exactly the use case you describe here).

At least for now I would suggest not eliminating the authentications file, but instead providing an additional way to pass in that information (with a well-defined priority ordering). If we find it isn't useful or causes problems, we could consider deprecating the authentications file as a separate issue/task.

For this local case, we're looking at the logic which starts here: https://github.com/driplineorg/dripline-cpp/blob/7c3872ada050509855b4988f37cd3f869e62b4c3/library/core.cc#L53

We could extend this be deciding some location in that sequence where the parameters should be changed to match certain environment variables, if they are defined; it would solve the local problem but not be generally useful.

I think that instead, we would like to be integrated with https://github.com/project8/scarab/tree/master/library/cli which I could see happening in a couple of different ways: 1) when writing a new "application" in scarab's CLI sense, in addition to defining CLI args, you could also define specific environment variables to go with those arguments. Then when parsing, if the environment variable is present, it would be as if its value had been passed to the flag (the CLI flag itself should still override the env variable). 2) perhaps scarab should support constructing some config object of global data that is (partly?) populated from env variables, possibly starting from something like: https://github.com/project8/scarab/blob/master_v1/library/param/global_config.hh ... Perhaps the constructor takes a PREFIX and all env vars matching that prefix populate some param_node? Any code built on scarab would then have access via something like get_instance(). This seems especially useful for things like a postgres interface, where the standard config param objects aren't being passed to a class that isn't in the core inheritance, but still needs authentication details.

nsoblath commented 4 years ago

Since we're talking about how to configure an application, it clearly makes sense to aim for a solution that's well integrated with the CLI portion of scarab, the part of the library that @laroque pointed out. That part of the scarab library is responsible for a standard user interface, and results in the population of a master-config object (a param_node). Scarab's CLI capabilities are based on the CLI11 library. Scarab essentially extends portions of that library in a way that integrates it with a framework for populating the master config.

The standard order of precedence for populating that config is:

default values --> user-provided config file --> non-option CL arguments --> application-specified CL options

CLI11 actually already has the ability to pickup values from environment variables. Therefore I propose that we take advantage of that capability and adjust the order of precedence for the master config accordingly. In fact, without any code changes necessary, we already have the ability to use environment variables. It's just not documented in scarab.

According to the documentation, for a given CL option with a specified environment variable, if the option is not given on the CL, then the environment variable will be checked and used if present. With the existing code, the order of precedence becomes:

default values --> user-provided config file --> non-option CL arguments --> environment variables --> application-specified CL options

There's one bit of awkwardness to this, which is that one type of CL entry, the non-option argument, is overridden by an environment variable, while another type of CL entry, the specified option, overrides an environment variable. However, there's also logic to say this is okay: non-option arguments can be anything, while environment variables and option arguments are specified by the application, and therefore have precedence.

I should also note that I haven't tested this capability, but I'm pretty confident it will work more-or-less out of the box, as the box currently stands.

In the context of a dripline application, there are extra layers of specification. In the core class, the order of precedence for member variable specification is:

default values --> auth file (for broker and broker-port) --> config object --> constructor arguments.

Everything we've talked about so far above goes towards specifying the config object, so the order of precedence is still understandable.

laroque commented 4 years ago

To be concrete with a second use case, let's consider the slack_relay service. It binds the alerts exchange with status_message.# where it expects messages of the form status_message.<message_severity>.<sender_name>. It then has a mapping between message_severity values and slack channels, and uses the sender_name to set the "username" when posting. In order to do this, it needs several "token" values so that it can interact with slack's REST API (similar for google in operator bot, or postgres in the sensor logger).

Say I were to set an environment variable DRIPLINE_SLACK_API_TOKEN="xoxb-12341234-ABCDEFG", how would I access that in this code:

class SlackRelay(Service):
    def __init__(self, api_token=None, **kwargs):
        Service.__init__(self, **kwargs)
        #some magic here to check if there's a value in the master config and update if passed value is None?
        self._api_token = api_token
nsoblath commented 4 years ago

Ben and I had a conversation on Slack that eventually came up with a solution that we think might work. I'll try to summarize the conclusions here.

All of this is about putting together a master configuration (in the form of a param_node, for instance) that contains all of the necessary information for an application. For applications like dl-agent the pieces involved are all known at compile time. So all of the relevant CL options and any mappings to environment variables can be hard-coded.

On the other hand, for applications like dl-serve (and psyllid and katydid), the components that will be used are not known at compile time. Using psyllid as an example, a psyllid config file specifies a list of "nodes" that get created at runtime, and the configuration of that node that is applied once it's created. This is the sort of situation we're dealing with in dl-serve as well.

Speaking in the concrete terms of a psyllid configuration, there could be situations where a node needs a piece of global configuration information (this has come up in psyllid, actually, and it's not well handled at the moment). For example, let's say a node wants to access a database. It will need the DB username and password. We don't want to put that in the config file, so we could imagine that it's put in a pair of environment variables at runtime. Then the configuration process needs to be able to access that information. However, it might not be just that node that wants DB access. So it'd be better to have that information at a global level in the configuration, and be able to refer to it from the node's configuration. That leads us to want a couple of concrete features:

  1. Access environment variables from a configuration
  2. Set a value by referring to another place in the configuration file

While we're it I'll add a third point that would be convenient and is similar in scope:

  1. Set and use variables within the configuration

So all of this points towards a system where we process a configuration to do those three things. We would need to use some unique syntax to avoid clashing with jinja2 (currently used for helm-based deployments) and other common systems.

I would suggest, then, the following procedure for building a configuration:

  1. Do the existing configuration building: defaults --> user config file --> non-option CL arguments --> CL options (also allowing for environment variable substitution via CLI11)
  2. Pull variable assignments from the configuration (I suggest a simple global scope for variables values)
  3. Resolve intra-document links
  4. Perform variable substitution
  5. Pull information from environment variables

The order of operations above needs to be thought through carefully, and my suggestion here is only preliminary.

This would all go in Scarab and integrate with the existing application/CLI setup.