cloud-gov / cg-atlas

Repository hosting issues and artifacts related to operations of the cloud.gov platform
Creative Commons Zero v1.0 Universal
3 stars 1 forks source link

Verify that kubernetes cluster is resilient to upgrades and disruptions #154

Closed jmcarp closed 7 years ago

jmcarp commented 8 years ago

In order to avoid disrupting customers, we want to ensure we can monkey with the k8s deployment without impacting the availability of existing k8s-brokered services.

Acceptance Criteria

When any of these events occurs...

...k8s-brokered services remain available with all their data intact.

rogeruiz commented 7 years ago

Upgrading versions in k8s is a bit more complicated in k8s itself. Will split that into another story for now.

cnelson commented 7 years ago

Some notes from my testing:

Pods that need to migrate to a new node (if the node fails, or is replaced as part of a rolling upgrade) take on average 3 minutes to become available again.

The longest time I saw in my testing was 9 minutes for a pod to become available after triggering a reschedule.

Most of this is due to the time is takes AWS EBS volumes to detach/attach to an instance.