cloudfoundry-community / vault-boshrelease

Apache License 2.0
28 stars 35 forks source link

Full config #51

Closed MattSurabian closed 6 years ago

MattSurabian commented 6 years ago

Starts to address #50

To do:

MattSurabian commented 6 years ago

In initial tests it seemed like we didn't need to pass cluster_addr as I noted in the description. That kinda of surprised me. I've done some more testing (fully destroying my deployment and rebuilding) and as it turns out that behavior was due to cached cluster info in my test node. I'm trying to come up with some way to specify a cluster_addr without using DNS in the deployment manifest so that this can work by default the same way it does now since our manifest has no access to spec.ip

MattSurabian commented 6 years ago

One thought would be to set the default manifest to a single instance and not HA, which would further push the idea that this repo is for tinkering and if you don't want that you should move to safe.

jhunt commented 6 years ago

It seems like the fundamental problems with just specifying all of the config via an opaque HCL blob in the BOSH manifest are:

  1. Handling IP addresses for cluster configuration
  2. TLS certificates

For (1), can we just keep those properties out of the config property? The motivation for this change was to stop merging in pull requests for "I want to use this backend!"

For (2), can we just introduce a general TLS property, that works like this:

  properties:
    tls:
      - name: consul-thing
        certificate: ...
        key: ...

And then populate the files via something like ttar, and letting people reference the certs as /var/vcap/jobs/vault/tls/$NAME/cert.pem and /var/vcap/jobs/vault/tls/$NAME/key.pem from inside their config HCL?

MattSurabian commented 6 years ago

For 1) Yes we definitely can do this. In practicality folks should be using DNS anyway. If we leave it out that just means we're not supporting HA by default on this, which as I mentioned earlier I think is fine. :+1:

For 2) ttar is ideal for that problem, thanks for suggesting it! I didn't realize it was a thing. As for the single config property for TLS I totally agree. I think a follow up release should remove the backends (a PR I'm happy to make) and at that time the TLS property should come into play so it's a single breaking change of the config format which will hopefully be easier on users. What do you think?

jhunt commented 6 years ago

Let me think about it over the weekend.

MattSurabian commented 6 years ago

I've been thinking as well over the weekend and am leaning towards something like this:

1) The raw config property is added in the next release (0.9.0, the one with new backends) but the default configuration is not modified at all.

2) The raw config property becomes the default, all backends are removed, and the new TLS property is implemented. This would either be 0.10.0 or 1.0.0, which considering it's a breaking config change maybe 1.0.0 makes more sense? Your call. This version would essentially be an empty wrapper around the Vault binary only, anyone looking for an out of the box solution would then "have" to use safe; anyone using custom backends or wanting direct control of the config will use this bosh release.

That would mean I'd change this PR to not modify the default manifest at all and open a new PR that modified the default config and removed all backend logic, templates, and packages.

jhunt commented 6 years ago

I like the idea of making a clean break with 1.0.0.

So 1.0.0 would contain nothing but the default property and the TLS+ttar stuff for getting certs into the installation?

How would we handle IP addresses and cluster stuff (which normally comes from BOSH)?

MattSurabian commented 6 years ago

I guess there's a potential chicken egg problem. If folks are using DNS they can just explicitly write the fully qualified domain name. If they aren't there isn't a good way I'm aware of to get that information at manifest parsing time. But if they aren't using static IPs then it's hard to rig up DNS...

jhunt commented 6 years ago

What if we limit the scope of the configuration extensibility to just storage backends? That was the original pain point I wanted to solve (i.e. not having to cut a release for every custom storage backend and new property people wanted to change)

MattSurabian commented 6 years ago

The trouble is that a lot of these settings are in the storage backend block. What if there was a known value that we could replace when we process the template? I think this release used to support something like that in the past for instance number? In this case I think we'd only need IP or FQDN (fully qualified domain name)

MattSurabian commented 6 years ago

I opted for %%ip%% and %%index%%. Those seem to be the only spec items necessary given the current state of the configuration. Since we're passing a full config I changed the replace delimiters to %% from ( to make it a less common character.

jhunt commented 6 years ago

How common is (ip) and (index) in Vault configs?

MattSurabian commented 6 years ago

I've started deploying the current state of this to some development environments. I propose we merge this where it's at and I'll open a new PR for the ttar work we discussed above.

MattSurabian commented 6 years ago

Ah sorry, didn't notice the last note you posted.

I don't think it's super common, but I could see them being used maybe in a comment or something? I'm fine switching back to that if you'd prefer.

jhunt commented 6 years ago

Let's go ahead and switch back to that, and then I'll merge this into a branch other than master. I'd still like to see the ttar work integrated before we land this on master.

MattSurabian commented 6 years ago

Sounds good to me! Just updated the template values and README. Weren't we going to land this as 0.9.0 and then make the ttar and backend removal stuff 1.0? Doesn't matter to me either way just wondering whether I should put the ttar stuff on this branch or a new PR

jhunt commented 6 years ago

it's a substantial change from the 0.x line. Per semantic versioning we should really bump major, which means 1.0.

How hard is it to do the ttar stuff? If you don't want to tackle it, I can try to carve out some time this week to give it a go...

MattSurabian commented 6 years ago

No worries at all, in that case I'll add it to this branch. I've just started on the ttar stuff. Seems pretty straightforward so I should be done by this afternoon baring any :fire:

MattSurabian commented 6 years ago

The only thing I'm not totally sure how to do is update the blobstore with the latest ttar.

jhunt commented 6 years ago

Just put ttar in the src/ttar/ directory, and commit it to the repo.

MattSurabian commented 6 years ago

I haven't yet played with service brokers so I'm not sure if the service-broker code in here is something we should leave in or expect folks to deploy it separately ?

jhunt commented 6 years ago

I think we can just remove the vault-broker since the safe BOSH release has it.

MattSurabian commented 6 years ago

@jhunt I've deployed this version to a few directors to test it and it seems to be working as expected. Let me know if you need anything else from me in order to land

kitsirota commented 6 years ago

@jhunt any chance we could merge this in and cut a build?

Thanks!!!

jhunt commented 6 years ago

I dropped a note in #vault on CF slack, to see if anyone has any concerns or objections to merging this; otherwise I'll be merging it by the end of next week.

jhunt commented 6 years ago

@MattSurabian can you resolve the merge conflicts? I would, but I can't push to your Github org.

MattSurabian commented 6 years ago

No problem! Just resolved the conflicts; I left the vault-broker meta data in the blobs.yml file because I know some older releases might rely on that metadata being present. Maybe that's unnecessary or ill-advised and it should be removed?

It seems like the CI failure is related to the bosh director being used for testing and not the updated content of this pr?

I'm not able to kick that pipeline off manually otherwise I would

jhunt commented 6 years ago

CI is failing because someone thought our target BOSH lite wasn't worth keeping around.