Subprocess calls etc. in PulumiStack

JimMadge commented 1 year ago

Working through #1506 warnings were raised about possible shell injection/other attacks for subprocess calls. Having a look, I don't think I like what is happening in the PulumiStack class. For example,

https://github.com/alan-turing-institute/data-safe-haven/blob/f0a890632725e2b6883c8997787294b824caa1cb/data_safe_haven/pulumi/pulumi_stack.py#L201-L226

Nested try blocks are hard to read
Flow control relies on an exception being raised in self.whoami()
self.whoami() is a wrapper for pulumi whoami, there is a method for this in the Pulumi SDK
We are committing to being ABI(?) compatible with Pulumi (on top of being API compatible in other points in the code)
We hard code some decisions like the pulumi backend, removing flexibility

It feels like we have added code, and taken on maintenance burden, for something that is less flexible than saying "Run az login and pulumi login before running dsh" and just checking that in the code?

Does much of the rest of the code rely on these things? I'm worried we will end up with something quite hard to maintain.

jemrobinson commented 1 year ago

Summary

You're right that the requirement is to run az login and pulumi login before you run a command
It's not clear how often you need to run these as the tokens that they generate are valid for <some amount> of time
If we're going to recommend users to run them then our recommendation should be to run them both every time you run a dsh X command, which feels like an annoying amount of overhead. However, this isn't really a problem per se so if you think this is a good idea we can do so.
I think it's still necessary to check whether the user is logged in to Azure and to Pulumi in our code though - without this you get some really unhelpful errors from Pulumi eg. open keeper azurekeyvault://*****: failed to Dial default KeyVault

whoami from SDK

For the specific case of pulumi whoami, I don't think there is a method in the Pulumi Automation SDK for this - can you link to one if you've found one?

Why we need `az login`

Pulumi login using Azure storage as the backend makes a call using the az CLI (this isn't configurable to use the token we get from the Python SDK)
If this hasn't happened then the first pulumi X operation will fail.

JimMadge commented 1 year ago

who_am_i method.

This is a method of the Workspace class, which we can get from stack.workspace().

jemrobinson commented 1 year ago

Do you want to make a PR to switch to using this?

JimMadge commented 1 year ago

I can have a look.

I think it might require removing the functions which handle az login pulumi login. Actually, if we need a stack to exist to run stack.workspace().who_am_i() then we might only be able to do this after creating a stack object.

You can create a LocalWorkspace instance but that requires a Pulumi.yaml, Pulumi.stack.yaml. It feels like what you would use for a local program rather than an inline program.

jemrobinson commented 1 year ago

The current whoami function isn't actually needed, it just stops multiple calls being made to pulumi login.

JimMadge commented 1 year ago

Oh I see.

I think my position is, like in the issue, it would probably be better to require users to login to Azure and Pulumi and catch if they haven't.

jemrobinson commented 1 year ago

Happy for you to make that change. The exception that gets thrown if you try to log in to Pulumi without being logged into the Azure CLI is not an obvious one, but it should be possible to catch.

JimMadge commented 1 year ago

To add, some potentially confusing actions could be taken. For example if you are already logged in to a Pulumi account which does not contain your DSH deployments.

The dsh commands will continue as your are logged in. However, in the worst case this could lead to a new stack being created and existing resources being modified as the original stack was not found.

JimMadge commented 1 year ago

An alternative,

Functions to check whether logged in to Pulumi/AZ CLI, returns Username | None

Then one of (or both of?),

Single check
- Check at beginning of commands whether logged in
- If not logged in to either raise exception stating which
- If logged in to both
  1. Print usernames are continue, or,
  2. Ask user to confirm usernames are as expected
Multiple checks
- Create decorators to check if logged into Pulumi/AZ CLI and raise exception if not
- Decorate functions where required
- ```
@requires_az
def list_users(...):
```

I feel a check right at the top makes sense in any case. Decorating functions where they need features seems neat, but I'm not really sure it makes development easier. I'm not certain if we could do anything to handle errors better if, for example, a token expires part way through the program running.

@jemrobinson @craddm thoughts?

craddm commented 1 year ago

I agree a check at the top makes sense.

If we expect users to manage the Pulumi login themselves though, it needs to be clear how they do that. I don't know how to do it manually.

JimMadge commented 1 year ago

@jemrobinson Look around, I think that this shouldn't affect the Graph API stuff as that is handled separately.

Regarding that. Is this because Graph API is the only (or best) way to interact with Entra, or is a quirk of how we have the AAD on a different tenant to other resources?

JimMadge commented 1 year ago

If we expect users to manage the Pulumi login themselves though, it needs to be clear how they do that. I don't know how to do it manually.

Yes, at least to some extent. I think we must prompt and give a reasonable explanation. However, I think we should avoid

Taking on the burden of managing AZ CLI, Pulumi for users. At least from my experience here, I have found that confusing
Hiding the underlying systems from users. I think if you want to use this code, some familiarity with Pulumi, AZ CLI, Azure is necessary.

craddm commented 1 year ago

Agree, but point is that at the moment it isn't clear how or if you should interact with Pulumi manually. It's not as simple as pulumi login, or even manually specifying pulumi login azblob://<container-path>?storage_account=account_name. Your standard az login account doesn't necessarily have the right permissions to login that way. You can give it those permissions, but should you? If you can initially run all the scripts without your account having the Storage Blob Data Contributor role, should you be manually giving it that role later? Might that have some unintended consequence?

JimMadge commented 1 year ago

I'm not sure if I understand. However, those are the kinds of questions I think we shouldn't be answering or taking (too) strong opinions on.

pulumi login connects you to a backend. I don't think we should specify one backend over another, or build tools which assume a particular backend. I think the most we should say here is that a shared state is very helpful when you have multiple managers.

craddm commented 1 year ago

Ok, but the codebase as it is hardcodes the backend as Azure, no?

JimMadge commented 1 year ago

Ok, but the codebase as it is hardcodes the backend as Azure, no?

~~Yes, at the moment.~~ I think I would prefer it not to do that.

Actually, sort of. It will try to connect to a AZ blob backend if you are not connected to any backend. If you are connected to any backend, it will continue.

craddm commented 1 year ago

hmm, true! Again, think that shows we need to be clear about how exactly people use pulumi when using the codebase (even if that just means documenting that it'll use whatever backend you tell it to, but if you don't tell it otherwise it'll use azure)

(Linking to #1613)

jemrobinson commented 1 year ago

One of the things that the current system does is hide the complexity of logging into a custom Pulumi backend from the user. What you need to run is:

AZURE_STORAGE_ACCOUNT='<storage account name>' AZURE_STORAGE_KEY='<storage account key>' AZURE_KEYVAULT_AUTH_VIA_CLI='true' pulumi login azblob://<storage account container name>

It would be nice if we could come up with a solution that still hides as much of this as possible from the end user.

jemrobinson commented 1 year ago

Actually, sort of. It will try to connect to a AZ blob backend if you are not connected to any backend. If you are connected to any backend, it will continue.

The dsh init function sets up an Azure blob backend for you. The reasoning behind this is:

you're clearly running on Azure anyway if you're using this code so any costs arising will be lumped in with the rest of your infrastructure costs
it allows (or should do, subject to bug fixing!) multiple users to run using the same backend which I think isn't available with the Pulumi backend unless you pay for it (is that right @JimMadge?)

JimMadge commented 1 year ago

Good points @jemrobinson. The Pulumi backend question is more complicated.

Questions I think we need to answer,

Are Pulumi backends truly interchangeable?
1. In general
2. In the current code
What backend features do we need?
- Essential: state, state locking, encryption, sufficient security, modification audit
- Nice: visualisations, rollback
What shared configuration/secrets do we have, and what is the best way to share that (Azure storage accounts and keyvaults seem sensible as we will use Azure in any case)?
- Can everything go in one place, the Pulumi backend?

alan-turing-institute / data-safe-haven