catalyst / moodle-tool_dataflows

A generic workflow and processing engine which can be configured to do a large variety of tasks.
GNU General Public License v3.0
12 stars 8 forks source link

Redaction of secrets should only happen for non expression like inputs #612

Open keevan opened 1 year ago

keevan commented 1 year ago

This could be debatable, but consider the following situation

  1. I have many s3 steps.
  2. All of which require inputs to be set for the s3 connection credentials (bucket, key, secret, etc).
  3. To reduce the amount of configuration copied for each step, I use expressions to de-duplicate where they are configured.
  4. I would like to ultimately export this when it is ready, and import it into an environment for real usage.
  5. Once imported, I would only need to tweak any fields that were redacted.

So, because expressions in secret inputs are redacted. Instead of needing to add the secret in once. I now need to edit N steps to fill in the missing secret. Even if the value was actually an expression referencing the secret.

Proposal:

The setting might not need to happen if there is no big reason to have it. But this would improve the test, import and exporting process, in that there will be less things that need to be altered.

brendanheywood commented 1 year ago

In that scenario above I’d go one step further and reference a global var so you never set it in the flow in the first place. You should not need to set anything after a flow re-import.

Redaction in my mind means hiding it when shown, I’m border line about redaction in exporting. It sounds like this is complicating things and we could just export it. If you need the security then use a global var

keevan commented 1 year ago

In my real example it does live in a global var. Problem then is that it is dumped in the output logs.

I vaguely recall this being mentioned as unexpected behaviour?

keevan commented 1 year ago

Sorry I meant dataflow var. There is no global var setting yet afaik.

brendanheywood commented 1 year ago

Ok, but treat this as an output problem with the logs leaking a secret rather than a deeper problem with the export leaking a secret which is not a bug. Note that we cannot protect against an admin writing a flow which exfiltrates config, if a flow can use a secret then it must be able to use it, eg we must be able to give it to s3 in order to work at all. So we shouldn't waste too many brain cycles here hiding stuff in all cases especially if it is causing knock on complexity

keevan commented 1 year ago

Okay, so brain dumping out the scenarios where we currently output information. Then determine comparing with current behaviour if we should:

The changes proposed are based on the previous suggestion, so for the following scenarios:

As I write the above, I'm thinking main reason for this complexity currently is because we are handling "secrets" as part of working with dataflows. If this was abstracted into an integration or plugin which interfaces with a credential store e.g. vault, then this might be a bit easier to manage, and we could get away with calling it in an expression e.g. ${{ secrets.path.to.my.secret }} or ${{ secret('path/to/my/secret.field') }}. Then most controls here could be removed and it could be exposed as is, as the information would mainly be config (steps would not natively not expose secrets). Access to the credential store would be controlled at the site level or at the plugin level.

If that's too far out of scope, maybe a plugin level "secrets" store could be added, and would act like a simplified version of the would-be aforementioned integration. All fields marked as secrets in step types should be added as expressions, then they can just be exposed and stored and logged everywhere without actually exposing the underlying value.

A side thought is how far you want secrets management to go, whether it is write only and only steps can read and use it during execution, no where else, or same as previous but admins can read/write on some edit page.

brendanheywood commented 1 year ago

I'm happy with those 4 points, but I'd strongly lean towards breaking these up, deferring most of them, and doing only the bare minimum only when we actually need it. The only one which might be causing an issue right now which needs fixing is point 4.