falldamagestudio / UE-Jenkins-BuildSystem

Build Unreal Engine & games with Jenkins on GKE/GCE
MIT License
46 stars 11 forks source link

GCE plugin sometimes deletes more dynamic VMs than it should #42

Closed Kalmalyzer closed 3 years ago

Kalmalyzer commented 3 years ago

If the GCE plugin stops lots of nodes at the same time, it can decide to delete more VMs than necessary in one go.

Example below, where a delete is in-progress (but not completed) at the time that the persistence check figures that "nope, still too many nodes around, let's delete the second node as well".

The problem here is probably that the deprovisioning logic counts a node that is in the progress of being deleted as +1 node when counting number of nodes.

This results in the Jenkins build system not reaching a "steady state" during normal use but deleting/recreating nodes every now and then, and jobs will therefore not reach a predictably low launch duration for jobs.

Aug 03, 2021 3:21:58 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminate
Stopping instance build-game-linux-dynamic-mjbsbf
Aug 03, 2021 3:21:59 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminateThreadedWork
Waiting for stop operation for instance build-game-linux-dynamic-mjbsbf to complete
Aug 03, 2021 3:21:59 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminate
Stopping instance build-game-linux-dynamic-7xzq63
Aug 03, 2021 3:22:00 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminateThreadedWork
Waiting for stop operation for instance build-game-linux-dynamic-7xzq63 to complete
Aug 03, 2021 3:23:46 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminateThreadedWork
Instance configuration build-game-linux-dynamic specifies max 1 instances to persist: there are currently 2 for that instance configuration; the instance will not be persisted
Aug 03, 2021 3:23:46 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminateThreadedWork
Deleting instance build-game-linux-dynamic-7xzq63
Aug 03, 2021 3:23:46 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminateThreadedWork
Waiting for delete operation for instance build-game-linux-dynamic-7xzq63 to complete
Aug 03, 2021 3:23:51 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminateThreadedWork
Instance configuration build-game-linux-dynamic specifies max 1 instances to persist: there are currently 2 for that instance configuration; the instance will not be persisted
Aug 03, 2021 3:23:51 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminateThreadedWork
Deleting instance build-game-linux-dynamic-mjbsbf
Aug 03, 2021 3:23:51 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminateThreadedWork
Waiting for delete operation for instance build-game-linux-dynamic-mjbsbf to complete
Aug 03, 2021 3:23:51 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminateThreadedWork
Deleting instance build-game-linux-dynamic-7xzq63 done
Aug 03, 2021 3:23:56 PM INFO com.google.jenkins.plugins.computeengine.ComputeEngineInstance _terminateThreadedWork
Deleting instance build-game-linux-dynamic-mjbsbf done
Kalmalyzer commented 3 years ago

GCE allows us to see which operations (insert, delete, start, stop) that are going on for instances. By tracking, within Jenkins, when insert/delete operations begin, and querying GCE to find out when they complete, we can avoid these race conditions. Fixed by https://github.com/falldamagestudio/UE-Jenkins-Images/commit/b97852edff3c220305f9d556bc11421de43c286c.