esa-tu-darmstadt / tapasco

The Task Parallel System Composer (TaPaSCo)
GNU Lesser General Public License v3.0
106 stars 25 forks source link

Make blocking interrupts configurable #271

Open forflo opened 3 years ago

forflo commented 3 years ago

In essence, I am using

Job *job = ...;
...
intptr_t res1 = tapasco_job_start(job, &jl);
intptr_t res2 = tapasco_job_release(job, 0, true);

in order to start a PE via its PEID. tapasco_job_release does appear to do a busy wait on one CPU core. Is this expected behavior of the runtime or is it an error on my side?

jahofmann commented 3 years ago

We use eventfd for that: https://github.com/esa-tu-darmstadt/tapasco/blob/8b21b7fa7d0c4142402554425b5b9b337d45d20f/runtime/libtapasco/src/interrupt.rs#L86

The read will not block but return an error if nothing is available. In this case the thread will yield, but of course in general this is a "busy wait". This part is optimized for low latency.

If you are fine with higher latencies you can change: https://github.com/esa-tu-darmstadt/tapasco/blob/8b21b7fa7d0c4142402554425b5b9b337d45d20f/runtime/libtapasco/src/pe.rs#L137 to interrupt: Interrupt::new(completion, interrupt_id, true).context(ErrorInterrupt)?,

This should actually be part of the configuration file tbh, so a user can easily change between the modes.

forflo commented 3 years ago

Thank you for the quick reply!

One follow up question: Does this influence the speed at which the PE is processing its input?

jahofmann commented 3 years ago

No, only the speed at which the interrupt is noticed by the host. So for long running PEs this is less of an issue.

Check out slide 12 and 13 at https://www.mcs.anl.gov/events/workshops/ross/2020/slides/ross2020-heinz.pdf

The "Original Runtime" has basically the same performance as the Rust runtime with blocking waits.

forflo commented 3 years ago

Out of curiosity: Did you compare the power consumption of busy wait and interrupt-based wait strategies? I am wondering how much power is used up by the PCIe-IP.

forflo commented 3 years ago

The "Original Runtime" has basically the same performance as the Rust runtime with blocking waits.

The motivation behind rebuilding everything in Rust was memory safety and ease of maintenance, was it not?

jahofmann commented 3 years ago

Out of curiosity: Did you compare the power consumption of busy wait and interrupt-based wait strategies? I am wondering how much power is used up by the PCIe-IP.

PCIe typically uses around 30W regardless of usage. But that has nothing to do with the busy waiting. The busy waiting itself only affects the power usage of the host itself. There is no polling over PCIe, but simply polling the eventfd status flag. It mainly avoids that the CPU goes to deep sleep states which results in very long and unpredictable latencies. If you don't care about those, you can simply switch to blocking waits.

The motivation behind rebuilding everything in Rust was memory safety and ease of maintenance, was it not?

Yes, pretty much. It is much easier to extend, has a better error handling, comes with a huge selection of easy to use libraries and is at least as fast in the tested scenarios.

forflo commented 3 years ago

Thank you! 👍