Closed Kalmalyzer closed 3 years ago
GCE allows us to see which operations (insert, delete, start, stop) that are going on for instances. By tracking, within Jenkins, when insert/delete operations begin, and querying GCE to find out when they complete, we can avoid these race conditions. Fixed by https://github.com/falldamagestudio/UE-Jenkins-Images/commit/b97852edff3c220305f9d556bc11421de43c286c.
If the GCE plugin stops lots of nodes at the same time, it can decide to delete more VMs than necessary in one go.
Example below, where a delete is in-progress (but not completed) at the time that the persistence check figures that "nope, still too many nodes around, let's delete the second node as well".
The problem here is probably that the deprovisioning logic counts a node that is in the progress of being deleted as +1 node when counting number of nodes.
This results in the Jenkins build system not reaching a "steady state" during normal use but deleting/recreating nodes every now and then, and jobs will therefore not reach a predictably low launch duration for jobs.