Open rhs opened 7 years ago
What might work here is some kind of option that would tell claim to wait a configurable time for the cluster to be released and then forcibly steal the claim if it is not available by then. That way multiple concurrent jobs would wait for each other to finish, but you wouldn't have to worry about leaks.
I like the idea of having an example .travis.yml
file kicking around in perhaps an examples/travis
directory in this repository. We could also consider a Travis integration that allows us to automatically kill Kubernaut instances when a Travis job finishes via some kind of Travis API polling mechanism for projects.
While that approach would work for multiple concurrent jobs it means your CI process for unrelated projects is blocked by one of the other projects which holds the claim. That is probably undesirable.
Another option is increasing the claim # for a user or some combination of both.
kubernaut claim You attempted to
kubernaut claim
a cluster but you are already at your maximum claim limit (1). Please release your existing claimkubernaut discard
or wait until the existing claim expires.I ran into the above error message when setting up a travis job. The job barfed before I had a chance to discard my cluster. This seems like a thing that would commonly happen during initial setup.
I think documenting this somewhere, and how to properly release resources in a travis job would be useful. (There is an after_script that I think always runs which can do cleanup even if the job fails.)
Parallel jobs would be an interesting problem as well, given this limit.