devinus / poolboy

A hunky Erlang worker pool factory
http://github.com/devinus/poolboy
ISC License
1.55k stars 347 forks source link

new_worker failed cause poolby crashed #72

Closed linbo closed 9 years ago

linbo commented 9 years ago

Hi,

We use poolboy to manage redis connection pool, in production environment find redis connection closed sometimes.

Checked code in poolboy.erl line 275, find if new_worker failed, in our example, maybe redis start failed due to network error, it will cause poolboy crashed, all other existing redis connections will be closed.

Maybe new_worker should deal with failed case, and shouldn't affect the other processes managed by poolboy.

devinus commented 9 years ago

@linbo Add smarts to your worker to handle disconnects. This is not for the pool itself to handle. See https://github.com/interline/epgsql_pool/blob/master/src/epgsql_pool_worker.erl#L58 for inspiration.

linbo commented 9 years ago

@devinus @Vagabond @onkel-dirtus I don't get it. I mean poolboy_worker init/start_link failed, it will cause poolboy crashed.

As my understanding, poolboy is a pool, start new worker failed should not cause other workers in the pool terminated.

And refer to https://github.com/devinus/poolboy/blob/master/src/poolboy_worker.erl, start_link can returned {error, Reason}, why eat up error information but return ok?

devinus commented 9 years ago

@linbo What does your worker look like?

linbo commented 9 years ago

@devinus It is eredis (https://github.com/wooga/eredis), now I refer to https://github.com/phoenixframework/phoenix/blob/master/lib/phoenix/pubsub/redis_conn.ex, try add a proxy process to deal with this issue.

devinus commented 9 years ago

Yes, your worker itself needs to deal with disconnects/reconnects like Phoenix's example. You never want to let a supervisor handle disconnects/reconnects because of worker churn and restart strategy taking down the entire pool.

linbo commented 9 years ago

Hi, I don't know where can discuss poolboy design, just comment here.

I add a proxy process (https://github.com/yunba/eredis_pool/blob/master/src/eredis_proxy.erl), normally it works fine. But for redis connection timeout, I find eredis_proxy will be closed by poolboy.

I checked code (https://github.com/devinus/poolboy/blob/master/src/poolboy.erl#L309), when checkin process and overflow >0, it will close checkin process.

For my scenario, maybe network or other cases happened, new overflow process connect to redis are all timeout, at same time old good checkin processes will be closed. Then this time, maybe all connected redis client will be closed.

I am not sure how to deal this issue.