GorNishanov / coroutines-ts

20 stars 2 forks source link

Coroutines and dynamic allocation #37

Open pylorak opened 5 years ago

pylorak commented 5 years ago

AFAIK, even if there are ways to "guide" the compiler or to make it more/less probable, there is currently no guarantee a coroutine frame will be allocated statically.

I don't speak standardese myself, so unfortunately I cannot create a proposed wording for the standard. I do, however, ask you to please make a way / mechanism / define a set of rules, under which coroutine frame allocations can be guaranteed to be static. This might not matter for desktop, but coroutines in C++ is a very powerful feature that can bring (for a lack of a better phrase) "virtual multitasking" to any embedded device with a C++ compiler.

Needless to say, dynamic allocations are a strong no-go in most embedded contexts. As an embedded engineer myself, I see a lot of opportunities in using coroutines on embedded devices (think not Linux-capable single-board computers, but microcontrollers with a bit more RAM), and this new C++ feature is something I think could redefine how people should write embedded code in the future.

Avoiding dynamic allocations for coroutine frames might be a micro-optimization on desktop computers, and I assume this is the reason this wasn't thought to be a very important feature at the moment. But for low-end embedded devices, coroutines are a true game-changer, and thus avoiding dynamic allocations become extremely important. Coroutines in this context would allow us to write much cleaner, concise and safer code with less bugs, where previous code (unlike on desktop machines) had no alternative until coroutines emerged, It would be an absolute shame not to be able to use coroutines in an embedded context due to dynamic allocations.

I am very much clear on the fact that there are many cases where dynamic allocation of coroutine frames cannot be avoided. However, having a set of cases (even if very few) where it can be eliminated, making it be obligatory for the compiler in those cases, and clearly communicating those cases to developers (e.g. via the standard) will bring important advantages that are not merely micro-optimizations, but true game-changers on embedded devices.

GorNishanov commented 5 years ago

This paper describes conditions required for heap allocation elision optimization to work: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html

This paper outlines a bit how you can work in environments where you cannot use dynamic allocations: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1365r0.pdf

pylorak commented 5 years ago

Hello Gor,

Thank you for the response. Yes, I've seen the second paper already, but it is a bit different from what I am proposing. Both links you gave explore the situations where coroutine heap allocation elision becomes possible (or maybe even expected).

But I'm talking about making the elision in these cases mandatory for the compiler, not just optional. This is so that code between compiler manufacturers can be made reliably portable (the compiler landscape in the embedded world is way more diverse and far less actively developed than on desktop).

GorNishanov commented 5 years ago

So far, we could not come up with the standard wording to make the elision mandatory. (Note that it took 19 years to figure out how to specify mandatory NRVO). The only way to make sure elision happens is to override operator new in the promise and don't define the body for it. Then, if elision does not happen, you get a linker error, otherwise, you are good to go.

daveedvdv commented 5 years ago

s/NRVO/RVO/.

(NRVO is still not mandatory.)

GorNishanov commented 5 years ago

Thank you for the correction :-), Daveed

MartyMcFlyInTheSky commented 1 year ago

I wanted to drop in as a secondary embedded designer to emphasize the importance of what @pylorak 's is suggesting. The afforementioned mandatory elision could be big chance to implement continuation passing fork-join parallelism without compiler intrinsics and assembly. Currently, dynamic allocation really is a big crux on the way of implementing low overhead forking from the main thread of execution. If this could be guaranteed we could possibly gain low-cost parallelism abstractions even for single core systems (or systems like my esp32c6 which entertains two cores, one of which is a low power core so switching to it seamlessly during runtime would be a lot easier)