Open iszulcdeepsense opened 1 year ago
Right, so some unstructured and unordered thoughts.
@jsandroos @LookACastle @mariushart @Koshmaar thoughts?
I know of no option that's more "free as in freedom" than hashicorp vault's open source offering, they're the only real game in town I know of that's any degree of free.
If it's doable to support a vault without breaking the option of not having one, then I'd argue we don't let perfect be the enemy of good here. K8s has a decently quick release cadence, if people upgrade whenever a new version is released, they'd end up having to deal with lost jobs fairly often.
If both having and not having a vault is supported, then the concern of annoyed developers also kind of disappears - they can just not use the feature if they don't like it.
I don't see overall that a vault should be a hard requirement for running RaceTrack. Like Thor writes, it is possible to implement the functionality without removing the functionality of not using a vault. In our usecase a vault is surely nice to have.
Suppose we choose to implement a Vault.
There are several possible implementations, two of which are: A) RaceTrack (RT) keeps a vault and simply stores the submitted secrets in that vault. This leaves little responsibility for change on the current users - they can simply continue as current, using their .env files and submitting secrets to RK.
B) We give RT the option to access a vault and grab secrets from there (as Josef also discusses). This puts more tasks on the RK users since they have to commit their secrets to the vault before deploying a job.
Both option A & B requires implementation of vault integration in RT.
Option B might seem more secure, but users still need access keys for the vault, which are just as volatile as the vault, which indicates that option A could be a more feasible option. Option B could be made more secure by implementing a general security policy of keeping private keys private and locked on a fingerprint locked usb stick. From what I hear, the lab is not in a state to take on this task currently.
One difference between A & B boils down to who commits the secrets to the vault: RT in case A or RT in case B.
data engineering has long had an issue for implementing a vault for secrets here, since we handle them in .env files, which is not ideal. so if you consider doing a Vault we would be interested in using it as well :)
It seems for me the option B would fit here more. But a tangential issue, is that perhaps we could avoid storing user specific secrets (like their gitlab tokens), if we could use service accounts. I heard one secret data scientists store in Jobs, is their gitlab tokens, to access private repositories. Probably you have already considered using gitlab Project Access Tokens. What was the reason against?
Also regarding the option B, probably we could set in Vault policy which says that user logged using token ABC, can access only secret coupled with it, ie.
vault_token: ABC, vault_path: foobar/secret
foobar/secret
with content 123
(the secret which is needed by Job to do its.. job, ie. db password), and sets policy to be accessible only with token ABC.vault_path
the secret 123
.That way leaking single password to Vault, won't leak all others. Which is typical problem in this area:
There are some approaches they recommend: https://developer.hashicorp.com/vault/tutorials/app-integration/secure-introduction?in=vault%2Fapp-integration Didn't read too much into details, but perhaps one of those could solve above issue.
If not, then maybe other, more custom method... I was lucky when doing googling, and found this unpolished gem (or poo, you decide): http://assimilationsystems.com/2020/09/01/sharing-secrets-with-containers-using-custodia/ It shows how adding of overseeing component on k8s server helps to avoid storing any secrets on the client, by means of identifying the process that tries to read the token, through linux socket proc info (you could think of it as "biometric" authorization). The software is here: https://github.com/latchset/custodia On gpl 3, but a bit forgotten and complex, so not ideal for integration.
But there's hope, as the guy wrote a second article with an even simpler method without messing with k8s internals: http://assimilationsystems.com/2020/09/01/the-authproxy-method-of-sharing-secrets-safely-with-containers/ It requires to have Vault, but all secrets are stored there, none need to be stored on disk (of Job). There's little piece of code (authproxy) which listens on socket, identifies caller (ie. using process name), and for authorized, asks the Vault using its embedded secret, for required key and sends it to app/Job directly (could be checking first that ie. docker container with certain hash listens on certain port). Please read the article for more info :) Seems rather secure, and could fit Racetrack model (RT could have secrets plugin which when new Job is deployed, checks if it has in manifest authproxy: true, if so embedds in Job container a executable file with authproxy), without making Vault hard dependency. There's no reference implementation and it's not trivial thing, but logic is straightforward, and should be implementable by anyone... at least that's my initial impression.
Sorry for wall of text, and hope this will be of any use.
Authproxy is a really interesting concept, great find! I'd be amiss if I didn't note that there's a known race condition that the article mentions and gives a few links to discussion on. My conclusion after reading is that it's a purely theoretical exploit and not worth worrying that much about, but we need to be aware that it exists.
Great stuff Hubert - some really good thoughts here.
It seems we have landed on a consensus that we do indeed need a vault - I will look further into the setup and the practicalities of where we can host it.
The articles look interesting and let's make an issue out of implementing their methodology, but maybe only after we have a functioning vault with the first RT access version. Maybe it's a good idea for V2?
In Racetrack, infrastructure plugins are responsible for taking care of secrets right now: https://github.com/TheRacetrack/racetrack/blob/master/lifecycle/lifecycle/deployer/deploy.py#LL49C3-L53C3
As discussed, we can make Vault functionality baked in the Racetrack core as a configurable option. This will make it less intricate in case of Kubernetes plugin + Vault enabled.
Here's a concrete proposal. Tell me how much of it you object to.
@JosefAssadERST I generally agree. Just wondering about one thing: what about keeping secrets in k8s? Do we want to completely drop it in favour of Vault?
That's the idea. We don't know if in ten years 99% of our jobs are served in VMWare or on docker daemons. Vault is a neutral place that my proposal intends to work regardless of infrastructure type.
Good. It also feels more natural to take secrets away from infrastructure plugins, especially when some of them (ie. docker infrastructure) don't implement secrets at all.
I feel like limiting ourselves to only supporting an open core standard, is dangerous. I'm all for supporting hashicorp vault in addition to what we already have, but supporting it instead off what we already have is a harder sell.
Open core will always seal something that's commonly wanted behind a paywall - and more importantly, proprietary code wall - that's their business model. An example for hashicorp vault is two factor authentication unless I'm misreading - which is something most organizations are probably gonna want to have.
Furthermore, I'm not sure I trust hashicorp vault to be permissive forever, I don't think they'll be able to be permissive forever. They release financial reports, and with my limited understanding, they don't look profitable. They're a company, they're either going to become profitable, or cease existing.
TL;DR: I like supporting vault. I dislike going all in on vault.
I have no fundamental disagreements. Still, there's some really big advantages to splitting off secrets management. Can you suggest an alternative?
The alternative is supporting vault, without having it as explicitly the only option. Keep the old secrets management as well - or rip it out, but provide vault as some sort of plugin so we could at least in theory not have a vault if it starts becoming a problematic piece of software for us.
I am also in favor of making Vault optional, but how would that work assuming we rip secrets management our of infrastructure target plugins?
Would there simply not be any support for secret management?
If it's deemed too difficult to make secrets management plugins, then I'd rather keep secrets management in ITPs. Whether that be a load of ITPs like plugin-kubernetes-infrastructure-vault
, plugin-kubernetes-infrastructure-k8s-secrets
, and so on (no matter how inelegant I find that solution), or a big bloated plugin that supports all the secrets. I'd rather have either than I'd have us explicitly tied to COSS.
I don't think (knock on wood) that secrets management plugins would be too difficult to make though. Getting the secrets securely is the hard part - and we'd have to do that anyhow - after that they should just be exposing data within the walled garden, no? It's not like a secret pulled from k8s is gonna be different from one pulled from vault - they have the same value.
If we do make secrets management plugins, the default would probably be no support for secrets without a plugin though, yes.
Actually, we could also add all the secrets stuff to racetrack core probably, but I still think that having it as plugins makes the most sense.
I don't have a strong opinion on your proposals, both sound reasonable. Just quickly checked that having secrets plugins wouldn't be too much of a burden on code complexity as we anticipated, there's only 2 places: saving and reading. It will just proliferate plugins even more. I wonder if it's good or bad. It's good in terms of various possibilities, it's bad in terms of installing them, upgrading and maintaining the code.
How about:
?
Good enough for me, not perfect - I'd prefer to just have it as a plugin - but perfect is the enemy of good. If it's easier to bake support into the core rather than start off with a plugin, I'd say that's the correct way to do it until something comes along to change our mind. YAGNI and all that.
Agree. Let's not fall into YAGNI, though I believe Vault can be "softly hardcoded" by making it as generic as possible (separate module, etc), having in mind a possible change in future.
Should we consider keeping secrets outside of Kubernetes (Vault, etc)? So that once the kubernetes cluster is upgraded and wiped out, the jobs can be reconciled and revived automatically.
The only reason why jobs are now LOST and can't be recreated is due to secrets are gone.
Let's consider using Vault by HashiCorp or other tool for secrets management.
.env
files.