Better Master / cli() error handling

LearnBoost / cluster

Node.JS multi-core server manager with plugins support.

http://learnboost.github.com/cluster

MIT License

2.29k stars 159 forks source link

Better Master / cli() error handling #84

Closed kainosnoema closed 13 years ago

kainosnoema commented 13 years ago

I saw you had a closed issue about cli(), but there still seemed to be some loopholes. The first commit adds some simple try/catch blocks to better handle running cli() commands when no master pid or process exists, the second one prevents zombie processes from sticking around after errors thrown on listen() (port in use, etc).

I didn't have time to write tests, but its simple enough that there shouldn't be any problems... works great for me.

tj commented 13 years ago

hmmmm not sure how I feel about this. they are not really zombies, just orphaned, and most of the time will still operate fine but I know what you mean about it being annoying with development etc. I'll pull it down and tweak if necessary

kainosnoema commented 13 years ago

Ya I know what you mean about the second commit... technically its only relevant when you're using cli() as there's no way to reconnect and shut them down (that I know of?).

Go ahead and tweak how you see fit. Thanks!

tj commented 13 years ago

we still have the pids, I'll tweak it so that subsequent stop calls etc still work to signal the children even if the master is gone

kainosnoema commented 13 years ago

Ok that sounds great. So help me understand then, children can still serve requests without a master? Guess I just assumed that wouldn't work. Now I'm really curious what you're doing (not a simple proxy?).... will check that out.

tj commented 13 years ago

yup they can, it's not the best situation obviously, since there is no longer a master to manage them but it works fine. It's really simple actually, the workers are the only ones accept()ing, master is just managing the cluster itself, re-spawning workers etc but has nothing to do with the request/response cycle at all. The kernel load balances because each worker that is idle will be trying to accept() but of course only one can accept the connection at a time so the load balancing is free, and much faster than if we were to proxy.

it could also be possible to have a worker take the role of the dead master, it could ping master, if master is dead it could become the new master etc, but ideally master should just never die

kainosnoema commented 13 years ago

Brilliant. That's actually how Unicorn works as well... much better than some other clustering/balancing methods I've seen in Node. I hadn't taken the time to dig into your implementation, but now I'm actually much more excited about getting this into production. Since master isn't handling any requests, it should be fine. The issues I'm dealing with are just development related...

tj commented 13 years ago

it's pretty common from what I've seen. Not sure how practical it would be but it would be neat to have clusters of clusters haha, the master could receive the fd and pass it down to the child masters and to their workers etc

kainosnoema commented 13 years ago

Ya I'm fairly new to Node and the networking side of things, but I've seen a lot of really poor implementations. I think we'll be using this heavily though, so I'll brush up on things and see how I can contribute more intelligently!

tj commented 13 years ago

sounds good man. I think it's going in an interesting direction, the only other web server I've seen so easily configured is of course nginx