JuliaCI / julia-buildbot

Buildbot configuration for build.julialang.org
MIT License
19 stars 14 forks source link

juno signing jobs are frozen and repeating indefinitely #36

Closed tkelman closed 8 years ago

tkelman commented 8 years ago

@staticfloat and @one-more-minute please fix this asap, these jobs are occupying the buildbot for days at a time and not working. We urgently need to create other windows binaries right now for testing and the juno jobs are refusing to obey cancel commands.

staticfloat commented 8 years ago

Short-term fix in place. What I had to do was remote desktop into the windows slaves, run buildslave_stop, then I could cancel the pending build on the buildmaster's web UI. Once I canceled the pending juno signing build, I restarted the buildslaves with buildslave_start and things are hunky-dory again.

staticfloat commented 8 years ago

I've got a workaround pushed, so this shouldn't happen again in the future. Two issues left pending:

tkelman commented 8 years ago

Great, thanks so much for the responsiveness here. It looks like the caching server is still not working properly from the buildbots (and there's some upstream opensuse mirror flakiness http://buildbot.e.ip.saba.us:8010/builders/package_win6.2-x86/builds/269/steps/make%20win-extras/logs/stdio), but I should now have the necessary permissions to fix #35 which I will try to figure out over the next few days.

staticfloat commented 8 years ago

Yes, I have been unable to figure out the windows HTTPS issues. It's possible we should host cache.julialang.org on AWS instead of Openstack; I'm thinking there might be some issue with the hosts being on the same network as eachother.

staticfloat commented 8 years ago

Buildbot key access has been restored.

tkelman commented 8 years ago

The OSX job has been running for 61 hours http://buildbot.e.ip.saba.us:8010/builders/juno_osx10.9-x64/builds/3 - these need a timeout of some kind.

staticfloat commented 8 years ago

I don't think buildbot's timeout functionality is very robust; and if the "interrupt" functionality doesn't stop something, then buildbot's builtin timeout certainly won't fix it. As soon as 0.9 reaches final, I'm going to upgrade everything and hopefully that will shuffle around some of our bugs, as from what I can tell, 0.9 is a fairly substantial rewrite.

staticfloat commented 8 years ago

In the meantime, I logged into the OSX 10.9 buildbot, and forcibly restarted the buildslave.

tkelman commented 8 years ago

... and a new juno job started first: http://buildbot.e.ip.saba.us:8010/builders/juno_osx10.9-x64/builds/4

staticfloat commented 8 years ago

Lucky us, it looks like the "upload" step of a buildbot job cannot be stopped.

tkelman commented 8 years ago

I'll call this working well enough for now? The jobs today went through okay.