Closed gswallow closed 4 years ago
Hi,
I've added an additional code change. Instead of trying blindly for 15 seconds to connect and then giving up, check the process table for 'mongod' and keep trying to connect for much longer. The timeout is eventually 33 minutes, 20 seconds (2000 retries).
Yeah, 2000 retries may be overkill, but I've witnessed TokuMX taking a very long time to restart.
@gswallow This is related to #376: the service Resource is started, but it's not ready to be used, which creates a lot of issues. If we knew when the service was really ready and only continue after that, it would help a lot for many cases.
Hi,
I am working on restoring a replicaset from EBS snapshot. When I bring up three new instances with Chef, I hit line 101 of libraries/mongodb.rb, which doesn't match the error message. My returned error message is:
"local.oplog.rs is not empty on the initiating member. cannot initiate."
None of the IP addresses of the new MongoDB servers are the same as they were, previously, because I'm restoring volumes from snapshot. Therefore, if I create a new config document containing the new hostnames and pass it in through rs.reconfig, my replica set comes up.
If I modify line 101:
elsif result.fetch('errmsg', nil) =~ /(\S+) is already initiated/ || (result.fetch('errmsg', nil) == 'already initialized') || (result.fetch('errmsg', nil) =~ /is not empty on the initiating member/)
Then Chef will do this for me, but I'm not sure that it's safe? I'll submit a pull request if it is.
Thanks!