Gracefully shutdown when deploying new version

hems commented 9 years ago

Ideally we would be able to:

gracefully shutdown current version ( executing some custom code before the actual exit of the app )
fire another one in parallel which takes the new connections on board

Is thee any way of achieving this at the moment?

hems commented 9 years ago

@hsduk following your procedure we would still have to let the old server now that it will shutdown due to new version release, so it can gracefully shutdown. surely this would help in some cases ( :

hems commented 9 years ago

perhaps this is the file that should do all this magic?

https://github.com/arunoda/meteor-up/blob/mupx/templates/linux/start.sh

@arunoda perhaps we could benefit from adding https://www.consul.io to the project, so we could see services and versions from a nice dashboard?

MasterJames commented 9 years ago

I thought a rolling update takes one machine (of 5 min) down at a time and it doesn't need to be graceful (routing will catch an incomplete request and resend to another machine internally) as it also doesn't matter there are 2 versions deployed at the same time for a handful of minutes or more. So my understanding is since you don't need to be graceful you shouldn't. The purpose of the failsafe multimachine deployment means this is not worth the cycles [still I expect a signal to shutdown properly is sent anyway]. Also I'm thinking some of the features of consul are already there like key value store in etcd etc.still its probably best as an optional preference that is useful for some, but not desired by everybody.

arunoda commented 9 years ago

mupx(anb mup) have rolling deployment support. It's enabled by default. So, we don't work on graceful shutdown.

Graceful shutdown is an anti-pattern. In cloud era, server could die anytime. So, you need prepare your app for this. Graceful shutdown is a hurdle for that.

hems commented 9 years ago

i get that the app should be stateless, but sometimes it's very counter-intuitive so might worth having something simple, like

process.on('SIGTERM', function () {
  clean_my_stuff();
  process.exit(0);
});

:v:

MasterJames commented 9 years ago

I think you're right that somewhere in the rolling update it does this already or should. I've not looked into that yet. In a way it's not needed. Basically if a database write occurred and then it shuts down before a reply is fully returned a new message will occur, so in you meteor code it should not for instance cause a double increment etc. I'm uncertain if meteor would already recognize it's a duplicate message. Still if it's not the MongoDB master node in the meteor cluster it would probably not cause a problem (as that is routed there, or does it sync via watching the DB change log?). So maybe just cause a re-election (change master Mongo container) if and only if it's the Mongo master updating. Either way Meteor or Mongo will likely do the right thing (sorting and verify change logs) but again maybe your code needs to be fool proof too.

MasterJames commented 9 years ago

I remember now the Container will get a SIGTERM when shutdown and so therefore it will complete current requests, before complying.

hems commented 9 years ago

@MasterJames on my specific application i'm keeping track of not-logged in users, and communicate with them through DDP.

When my meteor shutdown:

Clients need reconnect to ddp again, using another instance and this already happens.
The instance that is being shutdown needs to related information from my mongodb

At the moment on the server i listen to the subscription.onStop to do my "clean up", but i'm not sure if this is called even when the server is being shutdown during deploy ( it seems some times it doesn't? i might be wrong )

I did a test on my server

 # server/app.js
 Meteor.startup(function() {
  process.on('SIGTERM', function() {
    console.log("sigterm");
  });
});

and it's being called twice when using meteor locally and i'm assuming the same happens on the server when using mup deploy.

I'm not sure if process.exit(0) after my "clean up" would actually mess with meteor's own SIGTERM listener ?

MasterJames commented 9 years ago

I'm not an expert at this yet, so I hope someone else can help explain. my understanding is the database master will be re-elected before the original completes its shuts down, or new request will queue until that's done. You/We should not need to cleanup and should design the code to store things in the database. I recall you can force automatically a guest user account, so maybe you/we need to do that if you need something to persist.Also storing that stuff in an internal state maybe an option.

I'm waiting for AWS EFS (preview only I'm on a waitlist) as I suspect Galaxy is too. So my thoughts are coming from a hopefully feasible shared folder perspective. I'm left wondering if all container instances in the meteor cluster are watching the change logs (but have not inspected that code yet either), or forward thier DB requests?

arunoda / meteor-up-legacy

Gracefully shutdown when deploying new version #486