EngineerBetter / concourse-up

Deprecated - used Control Tower instead
https://github.com/EngineerBetter/control-tower
Apache License 2.0
203 stars 28 forks source link

Concourse died - what's the best procedure to bring it back up? #28

Closed aterreno closed 6 years ago

aterreno commented 6 years ago

Hi, more a question than an issue, perhaps some docs on the subject would be great.

our monitoring says that our concourse went down

Pingdom DOWN alert: Concourse (xxx) is down since 16/12/2017 00:51:15. Reason: Network is unreachable

I didn't bother investigating and I've just ran a concourse-up deploy

Is that a 'decent' approach?

Is it possible that a new release of concourse-up broke the web worker?

Thanks.

aterreno commented 6 years ago

BTW, that didn't work 2017-12-16 00:36:47 +0000 time of the new version of concourse-up - Our Concourse is down since 16/12/2017 00:51:15.

Seems quite correlated...

DanielJonesEB commented 6 years ago

Thanks for the report @aterreno - one of our folks is looking into it. There's a problem with the upgrade path whereby newly-required values are not present in the config file in S3, and aren't generated either. Our testing wasn't good enough to catch this, so we're going to look into that.

We're looking into fixes now.

aterreno commented 6 years ago

@DanielJonesEB cool man, a couple of things (something I didn't bother figuring out when everything was working well)

I hope these questions will make concourse-up even better, I am happy to write some docs if you give me a couple of hints on how it works.

Cheers

takeyourhatoff commented 6 years ago

@aterreno Can you try redeploying with concourse-up v0.7.3? We have fixed a problem whereby new fields added to the ATC config were not properly initialised on existing deployments and have improved our testing to catch this in the future.

aterreno commented 6 years ago

Hi thanks, I ended up re-installing concourse from scratch as we couldn't wait for the fix, updated now and all good.

I'll close the issue.