Open johnzeng opened 7 years ago
Sounds like a bug.
I think the problem is that the supervisor is trying too fast to reconnect before the mongodb is restarted, it retried over the intensity
during the period
so the supervisor itself is terminated
Yes. I noticed that too, when testing this issue. I am now thinking about refactoring the mongoc module, as it is very hard for me to dive into this code. After the refactoring I think this error will be solved. But if it is critical for you - you can make a pr to master with old code.
No it's not critical, I happened to find this when I was switching my mongodb server. It won't happen frequently. I can wait for refactoring and I think that will be better.
the only thing that starts mc_pool_sup is mongoc:connect/3, which ends up doing mc_pool_sup:start_link/0 in the calling process; so if you happen to call mongoc:connect/3 in a process that ignores it, like i did (e.g. a supervisor, which always ignores 'EXIT' from non-children), then nothing will restart it. there are a few possible solutions i can think of: 1) return the mc_pool_sup pid from mongoc:connect/3 if it had to be started, but then this would change its return type; 2) remove the call to mc_pool_sup:ensure_started/0 from mongoc:connect/3, and add to documentation that you must start mc_pool_sup before using mongoc; 3) ditto, but just start mc_pool_sup from mc_super_sup; 4) use supervisor:start_child(mc_super_sup, ...) in mc_pool_sup:ensure_started/0 instead of calling start_link/0 directly.
I use mongodb_topology_pool to pool my mongodb's request, but I found out that if the mongodb is shutdown, the monogo_topology_pool will have no ability to reconnect and so I have to restart the whole application.
I have tried to figure out what's wrong and looks like the mc_pool_sup is always down after the workers get some connection error.