Passwords and keys are stored unencrypted in the datadir

jeanpommier commented 1 year ago

As it is right now, the datadir is stored in a git repo that is cloned when starting a georchestra deployment. This means you have necessary to be using a private git repo, because the passwords and keys (recaptcha for instance) are stored unencrypted in this repo. I must admit I'm not really fond of this. My proposal would be to add the possibility to apply a bunch of sed commands in order to replace at runtime the default passwords and keys with the actual production ones, that could then be stored in properly secured secrets. What do you think ? It probably can be done with an additional initcontainer following the one doing the git clone.

edevosc2c commented 1 year ago

I do agree with the general idea of keeping secret the passwords, but it all comes down to tradeoffs. It would be very tedious to support all the possible properties where you store passwords or keys.

I mean by that for me, it makes sense to use a private repository in the first place, the datadir contains a lot of sensitive information that shouldn't be public, not necessarily passwords or keys. For instance, in my opinion it's ok to store recaptcha keys in git, this won't hurt the georchestra platform but passwords to the database will potentially allow an attacker to manipulate the georchestra platform.

We have to draw a line, what we consider keeping secret inside the kubernetes cluster and what can be kept inside git. I think we should start to list what should be protected in the kubernetes cluster, what passwords could be stored currently in the datadir?

jeanpommier commented 1 year ago

I'm not sure we even have to do that. It depends also on the implementation. I was thinking maybe something flexible, like a list of strings that we want to run sed for. For each of those strings, we could give a list of paths (files) and the target value. This would be easy to configure, very flexible and I believe quite simple to implement

jeanpommier commented 1 year ago

Not certain it will work for every possible case though

jeanpommier commented 1 year ago

Personnally, I quite like to keep the same datadir for testing and prod environments and just replace at runtime with the few specific items, but I understand it could look overkill.

We could just accept to append additional shell commands at the end of the initcontainer's commands. This would cover this need (the user can add his custom sed commands) and more.

edevosc2c commented 1 year ago

Personnally, I quite like to keep the same datadir for testing and prod environments and just replace at runtime with the few specific items, but I understand it could look overkill.

We could just accept to append additional shell commands at the end of the initcontainer's commands. This would cover this need (the user can add his custom sed commands) and more.

That's what I have been thinking about. Usually in big/major docker images, this is done by having the ENTRYPOINT script executing whatever script it finds inside the directory /docker-entrypoint.d. Here is an example for a docker image of bitnami/ldap:

root@georchestra-ldap-6c9d5c875-4nwh6:/docker-entrypoint.d# ls
00-init  01-populate  02-indexes  utils

We could introduce some kind of support for this. So you would be able to add scripts through the helm values (so stored inside a secret/configmap?) or the scripts would be inside the datadir (maybe not a good idea if you want to add passwords).

I'm not sure we even have to do that. It depends also on the implementation. I was thinking maybe something flexible, like a list of strings that we want to run sed for. For each of those strings, we could give a list of paths (files) and the target value. This would be easy to configure, very flexible and I believe quite simple to implement

(Will reply soon, I have not found the time yet to think about)

jeanpommier commented 1 year ago

I'd propose to add this at the end of the git initcontainer's command:

    {{- if .Values.georchestra.datadir.customInitCommands }}
    {{ .Values.georchestra.datadir.customInitCommands |nindent 4 }}
    {{- end }}

which could be configured in the values.yaml with something like

    customInitCommands: |
      find /etc/georchestra -type f -exec sed -i "s|gerlsSnFd6SmM|${GEOSERVER_PRIVILEGED_USER_PASSWORD}|" {};
      sed -i "s|privateKey=.*|privateKey=${RECAPTCHA_PRIVATE_KEY}|" /etc/georchestra/console/console.properties;

And we would also need to support reading additional env vars in the initcontainer, a priori from a secret file, I'd say

jeanpommier commented 1 year ago

The bad thing is that if we want to have the full datadir in sync everywhere, it would mean giving the whole bunch of password to each initcontainer...

jeanpommier commented 1 year ago

Mmh, and we also need to pass those password values, for instance with an extra_environment parameter like for regular geOrchestra containers

edevosc2c commented 1 year ago

It's quite hard to make this secret management generic, works for most of the people. While this custom init script idea is a neat feature, this could become quite a mess to maintain.

Ideally, the app should be able to have parameters overridden by environment variables. This way we can set envs from secrets like this:

extra_environment:
  - name: RECAPTCHA_PRIVATE_KEY
    valueFrom:
      secretKeyRef:
        name: recaptcha-credentials
        key: recaptchaprivatekey

Isn't this already the case for the console application?

jeanpommier commented 1 year ago

Nope, not implemented. The root idea is to take the configuration from the "datadir" configuration files and while being able to override some of them using environment variables makes sense in a container world, I believe it could also annoy people not using containers. For instance, the risk of an env. var used for another app, messing up stuff. Which might come quite hard to debug. So I'd say this is mostly a docker-oriented topic and would better be kept on the docker-implementation side of it.

I'd say that if we support extra-environment variables and a custom init script as I proposed, that should be enough and very flexible, don't you think so ?

jeanpommier commented 1 year ago

Mmmh, I believe I'm reinventing the wheel: since the environment variables in the datadir's .properties files are resolved by the java webapps, every container supporting the definition of extra_environments already provides the expected mechanism. Here's what one can already do:

configure the value with an env var in the corresponding files in the datadir
define an extra_environment that can, if desired, tap into an existing secret to get the value

Maybe anyway it might be interesting to allow adding some additional initcontainers that could cover edge cases, but this it another matter. I'm closing this ticket.

georchestra / helm-georchestra

Passwords and keys are stored unencrypted in the datadir #46