Road to a supported installation

cirocosta commented 5 years ago

Hey,

Below is a list of items that we will need to complete/confirm before we can start moving/adding workloads onto hush house:

[x] Decide between having the workloads on top of GKE or PKS
- at the moment, deploying hush-house in PKS wouldn't give the team much more data than we'd get from GKE, where we already have it running, allowing us to not have to learn any details of PKS and just move directly to what we already have.
[x] Monitoring
- see the metrics deployment
- [x] https://github.com/concourse/hush-house/issues/26
[x] SLOS
- see the datadog dashboard
[x] Domain
- @scottietremendous - I sent out an ask request for this.
- [x] https://github.com/concourse/hush-house/issues/25
[x] Pager Duty - Alerts
- [x] https://github.com/concourse/hush-house/issues/26
[x] Setup Monitors on New Screens 📺
[x] Set-up Slack Channel
[x] Flight Attendant
- [x] https://github.com/topherbullock/flight-attendant/issues/2
[x] move Dwayne's pipelines to hush-house. His teams:
[x] Mailing List
[x] Set-up Wiki

Thanks!

cc @scottietremendous

YoussB commented 5 years ago

after talking to @cirocosta this morning, the steps that we should take are as follows:

[x] upgrade hush-house to use the latest chart version (currently 5.1.2)
[x] move everything to the new domain hush-house.pivotal.io #25
[x] move Dwayne's pipelines to hush-house. His teams:
- https://wings.pivotal.io/?search=team:cf-notifications
- https://wings.pivotal.io/?search=team:cf-tools

@scottietremendous wdyt?

YoussB commented 5 years ago

Extra things to think about: (not for hush-house GA but might be useful)

[ ] with helm/charts#12920 merged our process should be around creating only team-specific workers so we can get rid of noisy neighbors.
[ ] if possible we can leverage k8s secrets by creating a namespace per team and giving them access to this namespace allowing them to use it for their secrets.
[ ] adding auto terraform apply whenever the terraform scripts are updated.
[ ] adding more documentation about how to use the hush-house terraform scripts.

scottietremendous commented 5 years ago

@YoussB

I tend to think we should just go with the repo for incident reporting since that's what we basically do now with Wings.

On the second note, I think we should try out as many new features that helm can provide us as possible. I think it's important we still use this as a place to experiment.

aegershman commented 5 years ago

Decide between having the workloads on top of GKE or PKS at the moment, deploying hush-house in PKS wouldn't give the team much more data than we'd get from GKE, where we already have it running, allowing us to not have to learn any details of PKS and just move directly to what we already have.

Apologies for acting as some random dude interjecting my opinion here, but I'd gently suggest reconsidering.

Thoughts:

There aren't too many details to learn that are unique to PKS's actual kubernetes clusters, but if you're going to be dog-fooding this with the intent for Pivotal customers to parrot (like me! :D) it wouldn't hurt to get first-hand experience on the toolset that they'll be using, what config options are available when setting up cluster plans, getting credentials to the cluster, etc.
If nothing else, it would be beneficial to remove as much IaaS-specific implementation as possible. The current deployment appears to use IaaS/GKE-specific implementation details for the web/worker.nodeSelector and loadBalancerIP. It seems like there's not just an opportunity to experiment with Concourse's stability on kubernetes, but the operationalization of it's deployment
Speaking of operationalization, you may also consider dogfooding running multiple Concourse environments like a "sandbox" to validate, then promoting those changes to production. It's beneficial because not only does it provide the practical value of multiple environments for testing changes, but it helps suss out optimizing workflows for handling "promoting" a series of changes in sandbox to prod.

3.1. Dogfooding the "environment promotion" workflow is beneficial because it helps drive out optimizations for declarative configuration that can be statically defined in a helm values.yml and not require an operator to go through "upgrade steps". The more things are operationalized such that configuration param keys: & values can be set and not require an operator to perform "in-between" steps, the better.

thanks for listening to my dissertation 👍

YoussB commented 5 years ago

Hey @aegershman,

thanks for the thoughtful comment. It makes a lot of sense 👍.

The point of keeping the environment running against GKE now is that we have already run some tests against it and has been hardened enough. We are considering PKS as a very important use case for our helm chart, but we wanted to run some tests against it first, and also document the steps needed for using with PKS for, as you said, other pivots can parrot it.
If nothing else, it would be beneficial to remove as much IaaS-specific implementation as possible. The current deployment appears to use IaaS/GKE-specific implementation details for the web/worker.nodeSelector and loadBalancerIP. It seems like there's not just an opportunity to experiment with Concourse's stability on kubernetes, but the operationalization of it's deployment

^^ these are hush-house deployment specific params, we have already created, and still creating, tests that run concourse against different deployment params for instance: https://github.com/concourse/concourse/blob/master/topgun/k8s/baggageclaim_drivers_test.go#L66-L94 Please feal free to give us feedback around other stbility tests that might be helpful in our case.
for the third point, there are currently 2 way to run an experimental concourse environment using k8s:
- either using the concourse helm chart
- also, based on the helm chart if you follow the deployments with creds or without creds mentioned in this repo you should be able to create a quick sandbox environment.
I am not sure I got this point correctly, please tell me if it makes sense.

cirocosta commented 5 years ago

Hey @YoussB ,

In case we end up going with using namespaced secrets as a way of leveraging k8s cred mgmt, we'd need this one tackled first https://github.com/concourse/docs/issues/96 so that we can give a reference for the teams who end up consuming it.

Thanks!

YoussB commented 5 years ago

makes sense :+1:

concourse / hush-house

Road to a supported installation #27