cloudfoundry / concourse-infra-for-fiwg

This repo holds the deployment tooling used to deploy a Concourse instance for the Foundation Infrastructure Working Group
2 stars 7 forks source link

using preemtible instances for workers can cause false positive failures #39

Closed ramonskie closed 3 years ago

ramonskie commented 3 years ago

when using preemtible vms https://cloud.google.com/compute/docs/instances/preemptible these instances can be without noticed be restarted on any given time. this will cause the worker to disappear for a a short time which then will result in the following errors volume not found worker concourse-worker-* disappeared while trying to reach it

for the stemcell and bats tests tis means that there are leftover instances that are not beining cleaned up. so this needs to be handled.

we can maby prevent many outages to just restart the vm on our terms within the 24h period see https://github.com/estafette/estafette-gke-preemptible-killer and https://gist.github.com/ahume/d56699f3eb2292dbbc1ba3825d44e4b5