Closed jgraettinger closed 2 years ago
I've added a slew of new "snapshot" test to the ops
repo, which are lower-level and "closer" tests of particular library functionality. At least to me, they make it much easier to understand cause and effect when working on jsonnet libraries. These are supplemental to environment snapshots, which still have a lot of utility IMO for understanding grounded-out implications of a considered change.
All credentials have been switched over to sops
, and git-secret has been removed. All environments use a common pattern of a secrets.yaml
which is processed directly (with still-encrypted secrets) when rendering manifests, and which is then decrypted only when applying manifests to clusters.
I've added continuous deployment to the repository as well, setting up Workload Identity Federation to do so. A green build of the master
branch will apply manifests to estuary-owned clusters, and then prune the cluster.
ops
repo. ops
is now the source-of-truth for what's running.I've updated many, but not all, of our third-party dependencies in the instrumentation stack: promtail, grafana agent, node-exporter, kube-state-metrics. I've not updated Cortex or Grafana.
I've done some, but not all, of the refactoring to our libraries which would probably be appropriate.
I have not factored out our libsonnets into a fully public repo. I didn't get there, and do have very slight reservations it may worsen the development workflow.
The primary goals of this work -- switching to sops
, cleaning things up, applying updates for the most gregariously out-of-date components, and making it practical for any developer to contribute to the ops
repo, have been accomplished, and I think it's time to put this one down.
This issue is a bit of a grab-bag of deferred work and chores I'm knocking out related to our operations repo:
The
ops
repo saw heavy development about a year and a half ago, when we first stood up our managed control plane and monitoring infrastructure. It's still working fine, but many of our dependencies have seen significant updates and fixes that we're not taking advantage of.We've also adopted
sops
for credential management for Flow, and wish to also use it for our own operations (deployments, etc) as it's far easier to use with a growing team as compared to git-secret.Jsonnet is very powerful but exhibits a lot of "spooky action at a distance". There's real concern about how a team can onboard to our jsonnet infrastructure. It seems like we can mitigate this with some tooling improvements, and also by adding snapshot-based testing of are jsonnet libraries in their current form. Snapshots would be tightly scoped to the unit under development, making it (hopefully) a lot easier to figure out cause and effect.
The Gazette repository has significant kustomize infrastructure which was originally intended as a baseline for deployments, and is still the means by which soak tests are run. We'd like to consolidate this with our use of Jsonnet in new infra development.
Finally, our jsonnet libraries can be open sourced, which would also make them useful for direct Gazette users.