Closed xiaoming-qxm closed 6 years ago
CAF starts a thread pool on startup. Per default, it starts 2 * thread::hardware_concurrency()
threads. "Overbooking" the cores pays out for most applications, since CPUs can switch to other actors while the current one waits for memory. Also, we use a work-stealing policy that periodically tries stealing tasks if a worker becomes idle. You could tweak the number of workers, stealing intervals, etc. to see if you are just measuring background noise here.
Which 2 * thread::hardware_concurrency()
threads you are mentioned for? I just find the following code in the CAF source code and I think it just starts thread::hardware_concurrency()
threads. Is these some thing I am missing? Are you also include the timer
and logger
background threads?
In abstract_coordinator.cpp
:
num_workers_ = get_or(cfg, "scheduler.max-threads", sr::max_threads);
In defaults.cpp
const size_t max_threads = std::max(std::thread::hardware_concurrency(), 4u);
In coordinator.cpp
auto num = num_workers();
for (auto& w : workers_)
w->start();
Hm, seems like I should double-check my facts more often.
Are you also include the timer and logger background threads?
No, just a mistake on my end. However, the middleman also starts two threads: one for the middleman actor and one for the multiplexer.
@Neverlord Why CAF mailbox is implemented by blocking_actor? When I debugged my program, I got the following output:
caf::intrusive::lifo_inbox<caf::blocking_actor::mailbox_policy>::...
caf::blocking_actor::receive_impl
... ...
I'm not sure what you mean. Both "versions" of actors (scheduled and blocking) use caf::intrusive::lifo_inbox
. They merely instantiate the mailbox with different policies. For example, blocking actors don't have separate queues for streaming-related messages.
I am using CAF to port a MPI program into an actor-based one. It's no surprise to find that the speed of actor-based program is faster than MPI's.
However when I use
perf stat
to profiling them, I found that actor-based program has higher context switch and lower instructions per cycle than original MPI's version which don't make sense.Moreover I also profiled the caf
dining_philosophers
example and also find the same problem.Below is the result: