fubark / cyber

Fast and concurrent scripting.
https://cyberscript.dev
MIT License
1.23k stars 44 forks source link

Async is not the holy grail #27

Open dumblob opened 1 year ago

dumblob commented 1 year ago

Saw some preliminary mentions of async support in the future Cyber. I just wanted to point out that async is by far not a solution:

What color is your function? Asynchronous Everything Why is async "slow" and can not be made fast without making it sync 1 Why is async "slow" and can not be made fast without making it sync 2

Instead I would prefer Cyber exploring the space outlined in Proper support for distributed computing, parallelism and concurrency (yeah, I am obviously biased :wink:).

matu3ba commented 1 year ago

Cyber exploring the space outlined

To take things into a usable direction it would help to provide claims + actual benchmark code to validate those claims. Your description of 1-4 is thread-per-core scheduling. Synchronization along sockets, nodes or clusters is still tailored with mpi or simplified with stuff like hadoop and/or optimization requires hardware fiddling to squeeze the optimal performance out of the hardware.

As async's main reason is to prevent stalling during io or blocking syscalls until a limited amount of time (at least for scripting), I would defer the decision until others there is more experience from the lower layers (Zig folks doing the work for us). Most notable hw development of computing will likely be performance of data writes increasing dramatically and experiments without actual hardware or accurate software models are almost useless.

dumblob commented 1 year ago

Thanks for the link - I am glad that Zig devs know about CXL (and alike) becoming the norm in the next couple of years (I share @twoclocks opinion).

Admittedly though I am a bit lost here in this thread because what @twoclocks wrote further underlines that async/await (i.e. event loop with necessary - small but non-negligible - overhead) is not the way to go if one seeks a generic concurrent construct with the lowest possible overhead (definitely lower than async/await) which is able to leverage multicore HW.

Do not take me wrong - the approach I linked is certainly not the only one to explore. And by explore I really meant "try and see" (not "it is the final holy grail solution which you must go for") :wink:.


Btw. CXL (and RAM access generally) is super slow (1-2 orders of magnitude) compared to where the "sync vs async" stuff is being contested (i.e. in this discussion thread). sync leverages CPU caches much better than async and that is usually the sole reason why I would strongly advise not to use async as "the holy abstraction for paralellism" as that would force you to pay this super high price every time without any way to get rid of it.

With sync it is a different story - if you had sync paralellism primitive(s), you could use them directly (to utilize CPU much better) and you can implement async on top of sync at places where it makes sense (there are not many such places in well-written high-perf apps which leverage the "work processors" scheme I outlined in https://github.com/vlang/v/discussions/11608#discussioncomment-1365359 and https://github.com/vlang/v/discussions/11608#discussioncomment-1365353 ). Think you would have an API supporting multiple interleaved (or at least embedded) event loops as @twoclocks proposed - would not that be awesome?

TwoClocks commented 1 year ago

I did some async benchmarks a while back. You can find them here: TwoClocks/coroutine-benchmarks They've code-rotten a bit. I should update them. But I don't see much of a difference between sync and async. What little difference there is, is far smaller than a single context switch. This makes sense to me, as most async implementations are some form of stackless-coroutines/CPS. There just isn't much difference between the two.

In my mental model, async is just syntactical sugar over callbacks. Some languages try to make it more by adding libraries to select() over multiple awaits or make them cancelable. I have a low opinion of all that added cruft. But my use case is very narrow.

I think a decent event-loop and job scheduler should work fine in either a sync or async framework with minimal perf difference. Neither should care much about how work continues.

My comments in the Zig thread were more about the IO APIs than about sync vs async. I think zero-copy IO is going to be "the next big thing", and if your event loop takes a buffer to copy into when calling read() (like almost all do now), then your API is going to be sub-optimal "soon". There are lots of zero-copy event loops out there in low-latency/high-throughput environments. The ones I know of all bespoke and internal only, as they are highly hardware dependent. I think CXL is going to change that.

IcedQuinn commented 1 year ago

In my mental model, async is just syntactical sugar over callbacks. Some languages try to make it more by adding libraries to select() over multiple awaits or make them cancelable. I have a low opinion of all that added cruft. But my use case is very narrow.

I do like them being sugar to replace callback hell. It is far more readable to see what is going on when an instruction sheet just looks like a list of requests for futures and then routing them somewhere else.

In Lua you have to resort to shenanigans with coroutine yields and in Nim they are just a compile-time macro that tries to replace everything with equivalent code using futures and some kind of resumable state machine.