do not destroy test-kitchen instances if acceptance ended up failing

balanced-cookbooks / balanced-ci

Internal configuration for the Jekins CI environment

4 stars 1 forks source link

do not destroy test-kitchen instances if acceptance ended up failing #60

Closed mahmoudimus closed 10 years ago

mahmoudimus commented 10 years ago

ping @balanced-cookbooks/chefs

mjallday commented 10 years ago

lgtm

mahmoudimus commented 10 years ago

fixes #30

coderanger commented 10 years ago

This is functionally equivalent to the default of -d passing.
This is going to leak machines, who will clean them up?
The next time any build runs, it will forcibly kill any remaining instances (this is possibly the answer to #2, and we just eat the cost).

mjallday commented 10 years ago

we can add autoscaling to destroy these nodes after x minutes of doing nothing.

mahmoudimus commented 10 years ago

@coderanger - I originally wanted 2 make it just kitchen converge, kitchen verify and if successful, destroy it

My reasoning was you can just manually retrigger the acceptance job from jenkins when it fails and continually verify the instance until it passes (w/o incurring the cost of having 2 spin up an entire instance again)

coderanger commented 10 years ago

@mahmoudimus Not unreasonable but that isn't what this does :-)

mahmoudimus commented 10 years ago

@coderanger yep - let me change it to that.

mahmoudimus commented 10 years ago

@mjallday so it's not 1/2 dozen - one 6 of another

mahmoudimus commented 10 years ago

ping @coderanger addressed comments

coderanger commented 10 years ago

I really don't think this is a good idea, it only works if you assume only one build is in progress at a time. It removes more or less all the benefit of running each test on an isolated system. If the concern if just about speed, we should look in to baking partial AMIs instead. :-1:

mahmoudimus commented 10 years ago

@coderanger fair, but most likely, we'll have 1 build in progress @ a time for a particular svc, since there's a different machine running balanced-acceptance and precog-acceptance.

Until then, this is a quick dev workflow hack for us until we can bake partial AMIs which we've been told is not the time to prioritize this.

mahmoudimus commented 10 years ago

if the issue is that we need 2 reap, can we use autoscaling for this? /cc @mjallday

coderanger commented 10 years ago

I guess, but why not just do it all by hand then? With this, there is no easy way to kill the instance while a build is "in progress". The auto-scaling thing is not actually a thing, thats not a feature of EC2.

mahmoudimus commented 10 years ago

@coderanger i think the issue here is to address the issue of having to debug an instance and seeing why it's failing since we don't have error logs relayed from test-kitchen.

It will be very easy once we bake an AMI though, until then, I think we need this.

bninja commented 10 years ago

vote merge this and update our reaper to term old kitchen instances (doing that anyway). if its not useful now that we have acceptance working or interferes w/ |line revert.

coderanger commented 10 years ago

We should probably make a fab script or jenkins job that does a kitchen destroy then, just because there are going to be times when we need a big ol' reset button. Or at least document the process (SSH to builder, cd to folder, run bundle exec kitchen destroy) so people can do it :)

mahmoudimus commented 10 years ago

Closing.