Closed cockroach-teamcity closed 6 years ago
@nvanbenschoten This is because you had a manual test running at the time the nightly was triggered, right?
It looks like it is, yes. I'm not sure how that happened though because my tests were failing pretty quickly last night due to the Unknown option: "--tarball"
/--package-url
issue. I'm kicking off a run on master now to see if it's affected.
Saw this again when I tried to run Jepsen on master
. #20125.
Looks like a run failed to clean up after itself. Did you cancel a run at some point?
I'm deleting the GCE resources and trying another build.
Yes, I think I did cancel a run sometime yesterday, once it became clear that the run wasn't going to succeed.
OK, jepsen versioning is very confusing (it looks like a monorepo but doesn't act like one because the subpackages depend on published releases of each other) so my fix wasn't actually getting applied. I've got #20129 to fix this by adapting to the new flag name instead of fixing it upstream.
This happened again, with another cancelled manual build. We need to either make sure that this cleanup happens even if a build is cancelled (maybe move from terraform to roachprod?), or at least that we don't leave the orphaned machines around blocking future test runs (and costing money) for a week at a time.
We should probably also move to randomized resource names so we're not limited to one instance of these tests running at a time, but only after we've made sure they'll get cleaned up reliably.
While the build linked in my previous comment was manual, its cancellation was not: Canceled with comment: Agent removed
. It got caught up in other teamcity operations.
Maybe we should just run the cleanup step at the start of the process in addition to the end. This will ensure that we don't leave the orphaned resources around for more than a day. But I'm not sure if that works, since terraform destroy
relies on local state to know which resources have been created.
The following tests appear to have failed:
#412373:
Please assign, take a look and update the issue accordingly.