Verify that kubernetes cluster is resilient to upgrades and disruptions

cloud-gov / cg-atlas

Repository hosting issues and artifacts related to operations of the cloud.gov platform

Creative Commons Zero v1.0 Universal

3 stars 1 forks source link

Verify that kubernetes cluster is resilient to upgrades and disruptions #154

Closed jmcarp closed 8 years ago

jmcarp commented 8 years ago

In order to avoid disrupting customers, we want to ensure we can monkey with the k8s deployment without impacting the availability of existing k8s-brokered services.

Acceptance Criteria

When any of these events occurs...

[ ] Upgrade kubernetes version
[x] Upgrade stemcell
[x] Destroy minion

...k8s-brokered services remain available with all their data intact.

rogeruiz commented 8 years ago

Upgrading versions in k8s is a bit more complicated in k8s itself. Will split that into another story for now.

cnelson commented 8 years ago

Some notes from my testing:

Pods that need to migrate to a new node (if the node fails, or is replaced as part of a rolling upgrade) take on average 3 minutes to become available again.

The longest time I saw in my testing was 9 minutes for a pod to become available after triggering a reschedule.

Most of this is due to the time is takes AWS EBS volumes to detach/attach to an instance.