Open ms705 opened 9 years ago
An alternative option would be to move to CamIO2, which can be made to sleep, but will still have an asyc pattern for speed.
On 14 Jul 2015, at 4:57 pm, Malte Schwarzkopf notifications@github.com wrote:
Currently, the coordinator spins hard on incoming messages in the StreamSocketsAdapter when doing asynchronous receives. This was the easiest way of implementing the functionality, but is not actually required.
Instead, we should rework the StreamSocketsAdapter and StreamSocketsChannel code to use the Boost ASIO proactor pattern correctly (with a finishing async receive callback setting up the next async receive), or move to a libev or select-based implementation.
This isn't a correctness issue, but a performance one: the coordinator currently uses an entire CPU core on each machine. Usually, that's not an issue, but we may as well not waste resources unnecessarily.
— Reply to this email directly or view it on GitHub https://github.com/ms705/firmament/issues/30.
Now I met the same problem with the latest code, the coordinator just uses 100% cpu I'm not sure whether it is caused by this issue?
ping @ms705 Is there updates in this issue?
Hi @WIZARD-CXY,
Sorry for the delayed response -- we've been bogged down with a bunch of paper deadlines recently.
This issues is still outstanding. Our plan to fix it is to replace our own home-baked, polling RPC layer with gRPC, which will fix the spinning. However, it will take us a while to implement this.
In the meantime, you can work around the issue by adding a sleep()
or nanosleep()
invocation at line 83 in stream_sockets_adapter.h. If you sleep there (e.g., for 5ms), you will slow down the receive loop, leading to reduced CPU consumption. However, your RPC processing latency will also be increased.
Hope that helps!
@ms705 Thanks, I will give it a shot tomorrow!
Is this the right trick?I added usleep above. but the coordinator still using 100% cpu @ms705
@WIZARD-CXY Ah -- this workaround only works if there actually are any channels (i.e., other running coordinators or running tasks). Otherwise, the code falls into the early exit condition on line 59 and you spin until channels show up.
Here's a patch that works even when there are no channels (with the same provision re RPC latency as above):
diff --git a/src/engine/node.cc b/src/engine/node.cc
index 5c01a75..6353e55 100644
--- a/src/engine/node.cc
+++ b/src/engine/node.cc
@@ -106,6 +106,7 @@ void Node::AwaitNextMessage() {
VLOG(3) << "Waiting for next message from adapter...";
m_adapter_->AwaitNextMessage();
- //boost::this_thread::sleep(boost::posix_time::seconds(1));
+ usleep(500000);
}
In other words, add the usleep
invocation to AwaitNextMessage()
in src/engine/node.cc
. If you add this, you can get rid of the usleep
statements in stream_sockets_adapter.h
.
It works now@ms705, thanks
Currently, the coordinator spins hard on incoming messages in the
StreamSocketsAdapter
when doing asynchronous receives. This was the easiest way of implementing the functionality, but is not actually required.Instead, we should rework the
StreamSocketsAdapter
andStreamSocketsChannel
code to use the Boost ASIO proactor pattern correctly (with a finishing async receive callback setting up the next async receive), or move to alibev
orselect
-based implementation.This isn't a correctness issue, but a performance one: the coordinator currently uses an entire CPU core on each machine. Usually, that's not an issue, but we may as well not waste resources unnecessarily.