iszulcdeepsense commented 1 year ago

Should we consider keeping secrets outside of Kubernetes (Vault, etc)? So that once the kubernetes cluster is upgraded and wiped out, the jobs can be reconciled and revived automatically.

The only reason why jobs are now LOST and can't be recreated is due to secrets are gone.

Let's consider using Vault by HashiCorp or other tool for secrets management.

Primary goal is to make all the jobs reproducible so they can be reconciled by Racetrack without constantly asking users to redeploy.
We can also consider the new place for secrets to be utilized directly by users (job developers). So they can push their git credentials and runtime environment variables to the Vault in first place and to stop keeping their secrets in the local .env files.

JosefAssadERST commented 1 year ago

Right, so some unstructured and unordered thoughts.

The core idea is to outsource secrets management to a system off Kubernetes, Vault being the obvious candidate. Whether the secret owners populate them directly in Vault or keep putting them in env files is a secondary matter.
Does this create a hard dependency on Vault? I'm concerned bout that since Vault is open core. Can such a dependency be softened by a "use Vault if one is available, otherwise don't use Vault"? If yes, is it a lot of code to maintain to soften in this way?
Is Vault the only real game in town? Open Core isn't the only point of concern; a competitor might be fully FOSS but 99.99% of commits come from one commercial entity, raising the risk profile.
If we implement this and allow users to enter secrets in Vault, will they think it's nice on day 0 and get annoyed by the extra step on day 100? The truth is, this would mean job developers have two places to type the right thing not just one: the git repo, and in Vault.

@jsandroos @LookACastle @mariushart @Koshmaar thoughts?

LookACastle commented 1 year ago

I know of no option that's more "free as in freedom" than hashicorp vault's open source offering, they're the only real game in town I know of that's any degree of free.

If it's doable to support a vault without breaking the option of not having one, then I'd argue we don't let perfect be the enemy of good here. K8s has a decently quick release cadence, if people upgrade whenever a new version is released, they'd end up having to deal with lost jobs fairly often.

If both having and not having a vault is supported, then the concern of annoyed developers also kind of disappears - they can just not use the feature if they don't like it.

jsandroos commented 1 year ago

I don't see overall that a vault should be a hard requirement for running RaceTrack. Like Thor writes, it is possible to implement the functionality without removing the functionality of not using a vault. In our usecase a vault is surely nice to have.

Suppose we choose to implement a Vault.

There are several possible implementations, two of which are: A) RaceTrack (RT) keeps a vault and simply stores the submitted secrets in that vault. This leaves little responsibility for change on the current users - they can simply continue as current, using their .env files and submitting secrets to RK.

B) We give RT the option to access a vault and grab secrets from there (as Josef also discusses). This puts more tasks on the RK users since they have to commit their secrets to the vault before deploying a job.

Both option A & B requires implementation of vault integration in RT.

Option B might seem more secure, but users still need access keys for the vault, which are just as volatile as the vault, which indicates that option A could be a more feasible option. Option B could be made more secure by implementing a general security policy of keeping private keys private and locked on a fingerprint locked usb stick. From what I hear, the lab is not in a state to take on this task currently.

One difference between A & B boils down to who commits the secrets to the vault: RT in case A or RT in case B.

Valleo4 commented 1 year ago

data engineering has long had an issue for implementing a vault for secrets here, since we handle them in .env files, which is not ideal. so if you consider doing a Vault we would be interested in using it as well :)

Koshmaar commented 1 year ago

It seems for me the option B would fit here more. But a tangential issue, is that perhaps we could avoid storing user specific secrets (like their gitlab tokens), if we could use service accounts. I heard one secret data scientists store in Jobs, is their gitlab tokens, to access private repositories. Probably you have already considered using gitlab Project Access Tokens. What was the reason against?

Also regarding the option B, probably we could set in Vault policy which says that user logged using token ABC, can access only secret coupled with it, ie.

Job Foobar in .env stores vault_token: ABC, vault_path: foobar/secret
Vault admin sets a key foobar/secret with content 123 (the secret which is needed by Job to do its.. job, ie. db password), and sets policy to be accessible only with token ABC.
Job Foobar when deployed, in init func uses http with token ABC, to fetch from vault_path the secret 123.

That way leaking single password to Vault, won't leak all others. Which is typical problem in this area:

There are some approaches they recommend: https://developer.hashicorp.com/vault/tutorials/app-integration/secure-introduction?in=vault%2Fapp-integration Didn't read too much into details, but perhaps one of those could solve above issue.

If not, then maybe other, more custom method... I was lucky when doing googling, and found this unpolished gem (or poo, you decide): http://assimilationsystems.com/2020/09/01/sharing-secrets-with-containers-using-custodia/ It shows how adding of overseeing component on k8s server helps to avoid storing any secrets on the client, by means of identifying the process that tries to read the token, through linux socket proc info (you could think of it as "biometric" authorization). The software is here: https://github.com/latchset/custodia On gpl 3, but a bit forgotten and complex, so not ideal for integration.

But there's hope, as the guy wrote a second article with an even simpler method without messing with k8s internals: http://assimilationsystems.com/2020/09/01/the-authproxy-method-of-sharing-secrets-safely-with-containers/ It requires to have Vault, but all secrets are stored there, none need to be stored on disk (of Job). There's little piece of code (authproxy) which listens on socket, identifies caller (ie. using process name), and for authorized, asks the Vault using its embedded secret, for required key and sends it to app/Job directly (could be checking first that ie. docker container with certain hash listens on certain port). Please read the article for more info :) Seems rather secure, and could fit Racetrack model (RT could have secrets plugin which when new Job is deployed, checks if it has in manifest authproxy: true, if so embedds in Job container a executable file with authproxy), without making Vault hard dependency. There's no reference implementation and it's not trivial thing, but logic is straightforward, and should be implementable by anyone... at least that's my initial impression.

Sorry for wall of text, and hope this will be of any use.

LookACastle commented 1 year ago

Authproxy is a really interesting concept, great find! I'd be amiss if I didn't note that there's a known race condition that the article mentions and gives a few links to discussion on. My conclusion after reading is that it's a purely theoretical exploit and not worth worrying that much about, but we need to be aware that it exists.

jsandroos commented 1 year ago

Great stuff Hubert - some really good thoughts here.

It seems we have landed on a consensus that we do indeed need a vault - I will look further into the setup and the practicalities of where we can host it.

The articles look interesting and let's make an issue out of implementing their methodology, but maybe only after we have a functioning vault with the first RT access version. Maybe it's a good idea for V2?

iszulcdeepsense commented 1 year ago

In Racetrack, infrastructure plugins are responsible for taking care of secrets right now: https://github.com/TheRacetrack/racetrack/blob/master/lifecycle/lifecycle/deployer/deploy.py#LL49C3-L53C3

As discussed, we can make Vault functionality baked in the Racetrack core as a configurable option. This will make it less intricate in case of Kubernetes plugin + Vault enabled.

JosefAssadERST commented 1 year ago

Here's a concrete proposal. Tell me how much of it you object to.

Bake Hashicorp Vault in to RT core.
Remove secrets management entirely from infrastructure target plugins (hereafter ITP)

Advantages:

RT admins don't have to do a lot of thinking about where a secret for a given job might be, or whether installing a new ITP will implicitly override anything or add new options somewhere (e.g. if that plugin has its own secrets management backend)
Secrets can be populated before the ITP is even installed (weird scenario, but people are often weird)
Fewer if/elses in core and potentially in plugins
It's probably the simplest solution that works.
Reduces the footprint of ITPs since it means they should explicitly not do secrets management any longer.

Disadvantages

It's open core not open source. We've documented our concern
Bloats RT core (a bit?)
Not the most "acrobatic" solution: it would be more flexible if an infrastructure type could offer an alternative secrets management backend, but: the truth is, we don't need different secrets management backends. We just need one that works for any ITP.

iszulcdeepsense commented 1 year ago

@JosefAssadERST I generally agree. Just wondering about one thing: what about keeping secrets in k8s? Do we want to completely drop it in favour of Vault?

JosefAssadERST commented 1 year ago

That's the idea. We don't know if in ten years 99% of our jobs are served in VMWare or on docker daemons. Vault is a neutral place that my proposal intends to work regardless of infrastructure type.

iszulcdeepsense commented 1 year ago

Good. It also feels more natural to take secrets away from infrastructure plugins, especially when some of them (ie. docker infrastructure) don't implement secrets at all.

LookACastle commented 1 year ago

I feel like limiting ourselves to only supporting an open core standard, is dangerous. I'm all for supporting hashicorp vault in addition to what we already have, but supporting it instead off what we already have is a harder sell.

Open core will always seal something that's commonly wanted behind a paywall - and more importantly, proprietary code wall - that's their business model. An example for hashicorp vault is two factor authentication unless I'm misreading - which is something most organizations are probably gonna want to have.

Furthermore, I'm not sure I trust hashicorp vault to be permissive forever, I don't think they'll be able to be permissive forever. They release financial reports, and with my limited understanding, they don't look profitable. They're a company, they're either going to become profitable, or cease existing.

TL;DR: I like supporting vault. I dislike going all in on vault.

JosefAssadERST commented 1 year ago

I have no fundamental disagreements. Still, there's some really big advantages to splitting off secrets management. Can you suggest an alternative?

LookACastle commented 1 year ago

The alternative is supporting vault, without having it as explicitly the only option. Keep the old secrets management as well - or rip it out, but provide vault as some sort of plugin so we could at least in theory not have a vault if it starts becoming a problematic piece of software for us.

JosefAssadERST commented 1 year ago

I am also in favor of making Vault optional, but how would that work assuming we rip secrets management our of infrastructure target plugins?

Would there simply not be any support for secret management?

LookACastle commented 1 year ago

If it's deemed too difficult to make secrets management plugins, then I'd rather keep secrets management in ITPs. Whether that be a load of ITPs like plugin-kubernetes-infrastructure-vault, plugin-kubernetes-infrastructure-k8s-secrets, and so on (no matter how inelegant I find that solution), or a big bloated plugin that supports all the secrets. I'd rather have either than I'd have us explicitly tied to COSS.

I don't think (knock on wood) that secrets management plugins would be too difficult to make though. Getting the secrets securely is the hard part - and we'd have to do that anyhow - after that they should just be exposing data within the walled garden, no? It's not like a secret pulled from k8s is gonna be different from one pulled from vault - they have the same value.

If we do make secrets management plugins, the default would probably be no support for secrets without a plugin though, yes.

LookACastle commented 1 year ago

Actually, we could also add all the secrets stuff to racetrack core probably, but I still think that having it as plugins makes the most sense.

iszulcdeepsense commented 1 year ago

I don't have a strong opinion on your proposals, both sound reasonable. Just quickly checked that having secrets plugins wouldn't be too much of a burden on code complexity as we anticipated, there's only 2 places: saving and reading. It will just proliferate plugins even more. I wonder if it's good or bad. It's good in terms of various possibilities, it's bad in terms of installing them, upgrading and maintaining the code.

JosefAssadERST commented 1 year ago

How about:

Rip secrets management out of infr.t. plugins
Bake Hashicorp Vault support into core (i.e. not shipped with RT, but if one's available, the URL and credentials can be entered in the RT config)
Make it optional to use; if there's no Vault, RT works just as well but does not handle secrets, they get hardcoded into the jobs by job developers
Later if we find a good reason, we can make it a plugin
Later if a real FOSS alternative emerges, we can migrate to it

?

LookACastle commented 1 year ago

Good enough for me, not perfect - I'd prefer to just have it as a plugin - but perfect is the enemy of good. If it's easier to bake support into the core rather than start off with a plugin, I'd say that's the correct way to do it until something comes along to change our mind. YAGNI and all that.

iszulcdeepsense commented 1 year ago

Agree. Let's not fall into YAGNI, though I believe Vault can be "softly hardcoded" by making it as generic as possible (separate module, etc), having in mind a possible change in future.

TheRacetrack / racetrack

External place for keeping secrets #164

Advantages:

Disadvantages