evertrue / exhibitor-cookbook

Cookbook for Netflix Exhibitor
Other
16 stars 27 forks source link

Config template (re)render should restart exhibitor #10

Closed yourabi closed 7 years ago

yourabi commented 10 years ago

It doesn't look like the exhibitor service gets restarted on template change (such as adding new zk node)

jakedavis commented 10 years ago

So this is one of my confusion points on Exhibitor (we're still rolling this out so excuse my ignorance here); as I understood it, Exhibitor watches the common config in S3 or the shared FS and then automatically restarts itself to pick up the changes; this is how Exhibitor can manage the overall cluster (since they share the config and they're all watching). Could you elaborate on where the service should receive an explicit restart?

yourabi commented 10 years ago

mmm... could be that I'm confused here - so take this with a grain of salt...

My understanding was the any changes applied through the exhibitor console would be applied to the other servers, but I haven't seen anything that references polling/watching of the exhibitor properties file...

I use a chef query + template to generate the exhibitor.properties files to get around having a shared FS or using S3... I wasn't seeing changes picked up until I restarted exhibitor...

jakedavis commented 10 years ago

Ahhh I see what you mean! Okay, yeah, I think this is a bug then. Unfortunately things are super crazy for us right now, so I can't promise I'll get to it before Wednesday. I might be able to knock it out tomorrow, or if you want to PR the fix that is much appreciated too :)

yourabi commented 10 years ago

No worries - just creating the issue as a TODO/reminder. If I have time I'll circle back and take a stab.

jeffbyrnes commented 10 years ago

@yourabi I’d encourage the use of the shared config, btw. While it’s tempting to use Chef to manage Exhibitor’s config, restarting Exhibitor is, from what I can tell, a rather heavy-duty process, and it’s really meant to take care of itself (based on my experience). Also, you’re also potentially incurring downtime of your Zookeeper cluster (e.g., what if the nodes restart very close together?)

The S3 setup is pretty solid, and you can even use IAM profiles for your nodes to avoid having to distribute IAM access keys via Chef. That, of course, assumes you’re using EC2, which you may not.

Nonetheless, this should work as expected, but it should probably err on the side of not restarting Exhibitor.

yourabi commented 10 years ago

@jeffbyrnes thanks for the extra info... one reason I'm not super enthused with S3 shared config is that extra step in an outage at 3am...

Having to provision a new zk instance, then separately edit some config file in S3 vs just running one knife ec2 server create....

jeffbyrnes commented 10 years ago

@yourabi well, generally, once the config is in S3, that’s the last time you want to mess about with it. You can even have Chef initially provision that config, or, if it already exists, leave it alone. Exhibitor handles the rest moving forward.

A replacement instance, once it comes up, should add itself to the shared config and kick off a restart of the cluster so the other instances pick it up. That's how we've got it configured, and it works quite well.

Not to say I don't understand where you’re coming from; adding another piece to the stack should always be considered carefully.

jakedavis commented 10 years ago

Confusion continues ;)

So, there are two files: the one given by --defaultconfig and the one given by the shared config in ZK/S3/shared FS. The latter should not be managed by this cookbook, as far as I can tell. The former is ... only required if you use -c none? So we don't need to actually render this file unless someone isn't using one of the other types?

This is rather unclear in the docs and I haven't been in this cookbook for a couple weeks.

jeffbyrnes commented 10 years ago

A bit fuzzy, but I feel like the defaultconfig is used by Exhubitor as the starting point for the shared config, if the shared config does not exist. Otherwise, it's ignored. I, too, found the docs to be unclear as well.