Why CAF has high context switch and low instructions per cycle using perf stat?

actor-framework / actor-framework

An Open Source Implementation of the Actor Model in C++

http://actor-framework.org/

BSD 3-Clause "New" or "Revised" License

3.2k stars 544 forks source link

Why CAF has high context switch and low instructions per cycle using perf stat? #752

Closed xiaoming-qxm closed 6 years ago

xiaoming-qxm commented 6 years ago

I am using CAF to port a MPI program into an actor-based one. It's no surprise to find that the speed of actor-based program is faster than MPI's.

However when I use perf stat to profiling them, I found that actor-based program has higher context switch and lower instructions per cycle than original MPI's version which don't make sense.

Moreover I also profiled the caf dining_philosophers example and also find the same problem.

I am running on Ubuntu 18.04
CAF version: 0.15.7
Linux kernel version: 4.15

Below is the result:

Neverlord commented 6 years ago

CAF starts a thread pool on startup. Per default, it starts 2 * thread::hardware_concurrency() threads. "Overbooking" the cores pays out for most applications, since CPUs can switch to other actors while the current one waits for memory. Also, we use a work-stealing policy that periodically tries stealing tasks if a worker becomes idle. You could tweak the number of workers, stealing intervals, etc. to see if you are just measuring background noise here.

xiaoming-qxm commented 6 years ago

Which 2 * thread::hardware_concurrency() threads you are mentioned for? I just find the following code in the CAF source code and I think it just starts thread::hardware_concurrency() threads. Is these some thing I am missing? Are you also include the timer and logger background threads?

In abstract_coordinator.cpp:

num_workers_ = get_or(cfg, "scheduler.max-threads", sr::max_threads);

In defaults.cpp

const size_t max_threads = std::max(std::thread::hardware_concurrency(), 4u);

In coordinator.cpp

auto num = num_workers();
for (auto& w : workers_)
      w->start();

Neverlord commented 6 years ago

Hm, seems like I should double-check my facts more often.

Are you also include the timer and logger background threads?

No, just a mistake on my end. However, the middleman also starts two threads: one for the middleman actor and one for the multiplexer.

xiaoming-qxm commented 6 years ago

@Neverlord Why CAF mailbox is implemented by blocking_actor? When I debugged my program, I got the following output:

caf::intrusive::lifo_inbox<caf::blocking_actor::mailbox_policy>::...
caf::blocking_actor::receive_impl
... ...

Neverlord commented 6 years ago

I'm not sure what you mean. Both "versions" of actors (scheduled and blocking) use caf::intrusive::lifo_inbox. They merely instantiate the mailbox with different policies. For example, blocking actors don't have separate queues for streaming-related messages.