adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
84 stars 101 forks source link

Alpine build nodes not dynamically provisioning, builds hanging #2988

Open andrew-m-leonard opened 1 year ago

andrew-m-leonard commented 1 year ago

The nightly builds are hanging due to alpine platform failing to provision dynamic nodes: https://ci.adoptium.net/computer/azurebuildagentc16b00/

https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-alpine-linux-x64-temurin/

sxa commented 1 year ago

I'm a bit confused as to why we're using the alpine-linux label for these builds. Since the alpine container can run on any Linux host capable of running the container build&&linux&&x64&&dockerbuild should be adequate and allow it to run on our other machines without the dependency on Azure - see https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-alpine-linux-x64-temurin/185/ which is now chugging along quite nicely after I started it with different labels.

Regardless, I've deprovisioned the hanging agent so we'll see if it's repeatable.

sxa commented 1 year ago

While I'm loathed to do it I could stick the alpine-linux label onto the x64 dockerbuild systems, but it's not really accurate.

sxa commented 1 year ago

Hmmm azurebuildagentc16b00 had apparently been running for 9 months.

sxa commented 1 year ago

OK We haven't had any dynamic provisions since the jenkins upgrade attempt yesterday by the look of it - we seem to have lost the Azure configuration.

sxa commented 1 year ago

I've reinstated the Azure configuration into the main config.xml of the jenkins server and reloaded it. It's now showing the cloud as active and provisioning at https://ci.adoptium.net/cloud-stats/

sxa commented 1 year ago

Not quite sure why it's felt the need to fire up seven of them simultaneously but 🤷🏻

sxa commented 1 year ago

Lots of Alpine PR tester jobs apparently

sxa commented 1 year ago

Anyway, back up and running so I'll close this, although I would prefer us to change the labels.

sxa commented 1 year ago

Recurred again, possibly after the jenkins upgrade last Thursday.

andrew-m-leonard commented 1 year ago

To workaround the possiblility of alpine-linux jobs hanging builds indefinitely, the following node now as alpine-linux label on it:

dockerhost-equinix-ubuntu2204-x64-1
sxa commented 1 year ago

To workaround the possiblility of alpine-linux jobs hanging builds indefinitely, the following node now as alpine-linux label on it: dockerhost-equinix-ubuntu2204-x64-1

Let's be clear, that is a temporary workaround and as per earlier comment I don't want that machine labelled as alpine-linux as a permanent solution to this.