facebookarchive / bistro

Bistro is a flexible distributed scheduler, a high-performance framework supporting multiple paradigms while retaining ease of configuration, management, and monitoring.
https://bistro.io
MIT License
1.03k stars 158 forks source link

bistro_scheduler startup error Singleton N6wangle12_GLOBAL__N_113PollerContextE requested before registrationComplete() call #10

Closed ghost closed 7 years ago

ghost commented 7 years ago

I got the docker build to run, but ctest had 4 tests failed, and bistro_scheduler got error: "wangle...PollerContextE requested before registrationComplete() call". What am I missing?

Test failure: export os_image=ubuntu:16.04 export gcc_version=5 make_parallelism=2 ./build/fbcode_builder/travis_docker_build.sh

$ docker run -it 1e47cff229f0 bash nobody@5fa08110f0b5:/home/bistro/bistro/cmake/Debug$ ctest Test project /home/bistro/bistro/cmake/Debug Start 1: test_async_read_pipe 1/56 Test #1: test_async_read_pipe .................. Passed 0.02 sec Start 2: test_async_read_pipe_rate_limiter ... 93% tests passed, 4 tests failed out of 56

Total Test time (real) = 38.80 sec

The following tests FAILED: 11 - test_worker (OTHER_FAULT) 19 - test_thrift_monitor (OTHER_FAULT) 28 - test_scheduler (OTHER_FAULT) 51 - test_remote_runner (OTHER_FAULT) Errors while running CTest


bistro_scheduler startup error.

root@27cb23c3eb07:/home/bistro/bistro# ./cmake/Debug/server/bistro_scheduler \

--server_port=6789 --http_server_port=6790 \ --config_file=scripts/test_configs/simple --clean_statuses \ --CAUTION_startup_wait_for_workers=1 --instance_node_name=scheduler I0406 14:21:51.122525 37 AutoTimer.h:142] Read config from /home/bistro/bistro/scripts/test_configs/simple in 89.35 us I0406 14:21:51.122921 37 AutoTimer.h:142] Parsed config with 1 jobs in 275.8 us I0406 14:21:51.123087 37 AutoTimer.h:142] Have 7 nodes after manual in 48.02 us I0406 14:21:51.123237 40 Monitor.cpp:74] Updating monitor histogram (/home/bistro/bistro/monitor/Monitor.cpp:60): Monitor transiently not making a histogram for simple_job since it is not loaded W0406 14:21:51.124105 42 RemoteWorkerRunner.cpp:89] RemoteWorkerRunner initial wait (/home/bistro/bistro/runners/RemoteWorkerRunner.cpp:75): DANGER! DANGER! Your --CAUTION_startup_wait_for_workers of 1 is lower than the max healthcheck gap of 125, which makes it very likely that you will start second copies of tasks that are already running (unless your heartbeat interval is much smaller). No initial worker set ID consensus. Waiting for all workers to connect before running tasks. I0406 14:21:51.124487 43 Bistro.cpp:184] Idle wait... I0406 14:21:51.126633 37 HTTPMonitorServer.cpp:130] Launched HTTP Monitor Server on port 6790, result 0 F0406 14:21:51.127137 37 Singleton-inl.h:241] Singleton N6wangle12_GLOBALN_113PollerContextE requested before registrationComplete() call. Check failure stack trace: @ 0x7f3873f8e5cd google::LogMessage::Fail() @ 0x7f3873f90433 google::LogMessage::SendToLog() @ 0x7f3873f8e15b google::LogMessage::Flush() @ 0x7f3873f8e379 google::LogMessage::~LogMessage() @ 0x7f38713e7ba2 folly::detail::SingletonHolder<>::createInstance() @ 0x7f38713e6b9c folly::detail::SingletonHolder<>::try_get() @ 0x7f38713e64d7 folly::Singleton<>::try_get() @ 0x7f38713e5790 wangle::FilePoller::init() @ 0x7f38713e5654 wangle::FilePoller::FilePoller() @ 0x7f3871fc3ccd _ZSt11make_uniqueIN6wangle10FilePollerEJRKNSt6chrono8durationIlSt5ratioILl1ELl1EEEEEENSt9_MakeUniqIT_E15__singleobjectEDpOT0 @ 0x7f3871fc2b4d apache::thrift::SecurityKillSwitchPoller::SecurityKillSwitchPoller() @ 0x7f3871fc2a89 apache::thrift::SecurityKillSwitchPoller::SecurityKillSwitchPoller() @ 0x7f3871fe4dcd apache::thrift::ThriftServer::ThriftServer() @ 0x7f3871fe49ef apache::thrift::ThriftServer::ThriftServer() @ 0xc796e0 _ZN9gnu_cxx13new_allocatorIN6apache6thrift12ThriftServerEE9constructIS3_JEEEvPTDpOT0 @ 0xc78c37 _ZNSt16allocator_traitsISaIN6apache6thrift12ThriftServerEEE9constructIS2_JEEEvRS3_PTDpOT0 @ 0xc781d4 std::_Sp_counted_ptr_inplace<>::_Sp_counted_ptr_inplace<>() @ 0xc7719d _ZNSt14shared_countILN9__gnu_cxx12_Lock_policyE2EEC2IN6apache6thrift12ThriftServerESaIS6_EJEEESt19_Sp_make_shared_tagPT_RKT0DpOT1 @ 0xc7629e std::shared_ptr<>::shared_ptr<>() @ 0xc75676 _ZNSt10shared_ptrIN6apache6thrift12ThriftServerEEC2ISaIS2_EJEEESt19_Sp_make_shared_tagRKTDpOT0 @ 0xc74462 std::allocate_shared<>() @ 0xc73004 _ZSt11make_sharedIN6apache6thrift12ThriftServerEJEESt10shared_ptrITEDpOT0 @ 0xc7004e main @ 0x7f387019f830 libc_start_main @ 0xc6f6c9 _start @ (nil) (unknown) Aborted (core dumped)

snarkmaster commented 7 years ago

Thanks for the report. There was a recent change in folly that requires explicit initialization for all folly::Singletons. Thrift has some singletons under the hood, so it fails. I was sure that the fix had landed, but I guess not. Should be fixed tomorrow.

Until then, you can patch bistro/server/main.cpp locally:

-  gflags::ParseCommandLineFlags(&argc, &argv, true);
-  google::InitGoogleLogging(argv[0]);
+  folly::init(&argc, &argv);
snarkmaster commented 7 years ago

You'd also need #include <folly/init/Init.h> at the top.

snarkmaster commented 7 years ago

Fix landed: https://github.com/facebook/bistro/commit/fee6eb548642636862ca6e3839617bb8dadc425a