Closed wake-up-neo closed 8 years ago
Here is the issue, it waits infinitely
0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
1 0x00007fc7195f2c7c in std::condition_variable::wait(std::unique_lockstd::mutex&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
2 0x00000000031b4140 in folly::EventBase::runInEventBaseThreadAndWait(std::function const&) ()
3 0x0000000001de33e7 in HPHP::CurlMultiAwait::CurlMultiAwait(HPHP::req::ptrHPHP::CurlMultiResource, double) ()
Could this be related to a very large number of tiny (bytes) and very fast (milliseconds) http requests together with this https://bugs.php.net/bug.php?id=61141 ?
Finally fixed https://github.com/facebook/hhvm/pull/7205/files
I'm re-openning this issue again since the pull request is stuck in discussion with no progress.
HHVM Version
HipHop VM 3.13.1 (rel) Compiler: tags/HHVM-3.13.1-0-g4f382ad928a6e2a0607a8dcb251002aca77f11f6 Repo schema: 655b5912cb8136e9df6f9be972153e38ac446e0f
The problem
I'm facing the problem of hanging of all hhvm threads without step-by-step simulation of the issue, this may happens once a day, or once a week on one of the servers without correlation of highload/traffic etc.
So what is really happens - hhvm(-cgi) stops responding with 0 CPU usage. This process in the example below were running for 24 hours I guess with lot's of successfully completed tasks.
Then strace -f -p {PID}
gdb
server.ini
php.ini
It's not suitable to publish code here since it doesn't reproduce the problem by itself. Let's say I'm using code from the example (very similar) https://docs.hhvm.com/hack/reference/function/curl_multi_await/
I tried to change
$select = await curl_multi_await($mh);
to$select = curl_multi_select($master);
to make it sync, but it still do hang out all threads sometimes. I also tried all possible combinations of using await, incl.await AwaitAllWaitHandle::fromArray
,\HH\Asio\join($awaitableTasks)
,->getWaitHandle()->join()
etc. and there were no difference for this problem.I have all necessary timeouts, on curl level, on hhvm level, and also hard-coded "break" after "critical" timeout inside while (curl_multi_exec) which was added after facing the issue and doesn't solve it, with all necessary descriptors closing etc. and now I don't really know what to do since I'm already trying to find the problem for two months.
Here is example of similar server with the same code and configuration, and it doesn't have this problem at all since I'm using it. You may see it by the number "6126010". Another servers, that may stuck - showing the same strace as below but once they hangs - every thread except master thread starts showing FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
Will appreciate any help regarding this issue.