Closed bjori closed 8 years ago
It looks like this was raised here:
It looks like mongo-ochestration provides all the needed detail in debug logging. evergreen needs to spawn MO with debug level logging so we can track down the issues. In this case it looks like mongod/s didn't respond within 180 seconds.
All log output goes to a log file, however. You'll need access to the MO
log file to see more detail about the Exception. FWIW, a TimeoutError
means that a mongo[ds] process failed to start in a reasonable amount of
time. The reason why this happened can sometimes be learned from the stderr
output of the process, and sometimes it's only visible in the MongoDB log
file, so there's a limit to how helpful MO can be here. However, I think MO
could at least forward mongo[ds]'s stderr output to the log and stderr.
On Wed, Sep 23, 2015 at 9:57 AM, Bernie Hackett notifications@github.com wrote:
It looks like this was raised here:
It looks like mongo-ochestration provides all the needed detail in debug logging. evergreen needs to spawn MO with debug level logging so we can track down the issues.
— Reply to this email directly or view it on GitHub https://github.com/10gen/mongo-orchestration/issues/198#issuecomment-142662044 .
MO spawns the mongo[d|s], and knows where it logs to, right? I think its easier to include that log in the error then it is for arbitrary process to do the detective work which monogod it was and where the logs are, since - like in this case - MO was asked to bring up a full sharded cluster.
I think this was resolved as part of #199 (cat the failed server log to the MO log). Feel free to reopen if there's more to be done here.
The mongoc and c++ drivers are have relatively frequent issues with mongo-orchestration which we are unable to debug.
See for example: https://evergreen.mongodb.com/task_log_raw/mongo_c_driver_ubuntu_1204_64_integration_test_2.6_sharded_7c5dbed32ef4ca2fe9960193e77cda31a9faa239_15_09_17_20_32_42/0?type=T#L57
Improving the error reporting would improve the user experience dramatically, not to mention debugging and understanding the issue. Currently MO just dumps a stacktrace in a oneliner which doesn't help anyone. After scouring the long line for anything useful we see there was an exception thrown: raise TimeoutError(errno.ETIMEDOUT, message)\n", "TimeoutError: 110\n"
What timed out? And why?