Initial Commit of Promenade Script

aric49 commented 7 years ago

Initial commit of the Promenade deployment script.

v1k0d3n commented 7 years ago

this is a great start. looking over the script, many of the concepts are similar to halcyon (as i've already stated). if this is for highly available, resilent environments and considering that this is a ground-up approach, I would recomend that this effort start off more like this: https://github.com/kelseyhightower/kubernetes-the-hard-way

kubeadm at this point will introduce many variables which will make upgrades of our control plane very difficult. just a thought.

v1k0d3n commented 7 years ago

and to be perfectly clear...etcd...at least start off correctly with etcd. then things can expand from there. 3 node quorum for etcd to get raft consensus with HA.

intlabs commented 7 years ago

This looks like a fantastic 1st parse of getting a k8s environment up and running. But as @v1k0d3n says above I think we should be tackling the ETCD cluster that backs K8s 1st, and getting a really solid foundation there before moving further up the stack.

To do this we'll need to consider a few things, starting with HA, but also going a bit deeper (please bear in mind I'm not yet familiar with the failure and threat scenarios, or hardware that we are expecting to encounter):

TLS and authentication (possibly requiring us to start with a basic CA or integrate with existing Infra)
Recovery from power outages/node failure
Cluster storage, which becomes critical under high load events
Port and IP allocation (unless we have external dns)

Once we have this nailed down, then building out highly available/scalable and secure k8s on top of it should be really easy :)

alanmeadows commented 7 years ago

These items you mention @intlabs need to be understood and documented.

Initially the intent was to start with kubeadm and provide a surrounding wrapper to support bringing resiliency into that approach as it doesn't natively support it.

That said, after some careful analysis of the kubeadm gaps and what we will need to solve for I see great advantage to us adopting Kargo for production installations while we simultaneously work to add these features to kubeadm natively in an entirely separate track. Kargo solves many if not all of the kubeadm production resiliency gaps, while stating Kubeadm as their target for the actual K8s lifting in their roadmap. The work Aric has here will educate that approach because I am not clear on whether Kargo does everything we need (for instance, all baremetal hosts leveraging skydns endpoints), and the automation of its installation (which Promenade will pivot to) will need to be aware of that.

I have invited @janwillies to work with @aric49 on this effort.

In any event, I do not want any of the above to effect the validity of this current pull request because this effort will be an evolution of Promenade from Kubeadm to automating Kargo and ensuring our needs are met there.

att-comdev / promenade

Initial Commit of Promenade Script #1