Open dumblob opened 1 year ago
Cyber exploring the space outlined
To take things into a usable direction it would help to provide claims + actual benchmark code to validate those claims. Your description of 1-4 is thread-per-core scheduling. Synchronization along sockets, nodes or clusters is still tailored with mpi or simplified with stuff like hadoop and/or optimization requires hardware fiddling to squeeze the optimal performance out of the hardware.
As async's main reason is to prevent stalling during io or blocking syscalls until a limited amount of time (at least for scripting), I would defer the decision until others there is more experience from the lower layers (Zig folks doing the work for us).
Most notable hw development of computing will likely be performance of data writes increasing dramatically and experiments without actual hardware or accurate software models are almost useless.
Thanks for the link - I am glad that Zig devs know about CXL (and alike) becoming the norm in the next couple of years (I share @twoclocks opinion).
Admittedly though I am a bit lost here in this thread because what @twoclocks wrote further underlines that async/await (i.e. event loop with necessary - small but non-negligible - overhead) is not the way to go if one seeks a generic concurrent construct with the lowest possible overhead (definitely lower than async/await) which is able to leverage multicore HW.
Do not take me wrong - the approach I linked is certainly not the only one to explore. And by explore I really meant "try and see" (not "it is the final holy grail solution which you must go for") :wink:.
Btw. CXL (and RAM access generally) is super slow (1-2 orders of magnitude) compared to where the "sync vs async" stuff is being contested (i.e. in this discussion thread). sync
leverages CPU caches much better than async
and that is usually the sole reason why I would strongly advise not to use async
as "the holy abstraction for paralellism" as that would force you to pay this super high price every time without any way to get rid of it.
With sync
it is a different story - if you had sync
paralellism primitive(s), you could use them directly (to utilize CPU much better) and you can implement async
on top of sync
at places where it makes sense (there are not many such places in well-written high-perf apps which leverage the "work processors" scheme I outlined in https://github.com/vlang/v/discussions/11608#discussioncomment-1365359 and https://github.com/vlang/v/discussions/11608#discussioncomment-1365353 ). Think you would have an API supporting multiple interleaved (or at least embedded) event loops as @twoclocks proposed - would not that be awesome?
I did some async benchmarks a while back. You can find them here: TwoClocks/coroutine-benchmarks
They've code-rotten a bit. I should update them.
But I don't see much of a difference between sync
and async
. What little difference there is, is far smaller than a single context switch.
This makes sense to me, as most async
implementations are some form of stackless-coroutines/CPS. There just isn't much difference between the two.
In my mental model, async
is just syntactical sugar over callbacks. Some languages try to make it more by adding libraries to select()
over multiple await
s or make them cancelable. I have a low opinion of all that added cruft. But my use case is very narrow.
I think a decent event-loop and job scheduler should work fine in either a sync
or async
framework with minimal perf difference. Neither should care much about how work continues.
My comments in the Zig thread were more about the IO APIs than about sync
vs async
. I think zero-copy IO is going to be "the next big thing", and if your event loop takes a buffer to copy into when calling read()
(like almost all do now), then your API is going to be sub-optimal "soon". There are lots of zero-copy event loops out there in low-latency/high-throughput environments. The ones I know of all bespoke and internal only, as they are highly hardware dependent. I think CXL is going to change that.
In my mental model, async is just syntactical sugar over callbacks. Some languages try to make it more by adding libraries to select() over multiple awaits or make them cancelable. I have a low opinion of all that added cruft. But my use case is very narrow.
I do like them being sugar to replace callback hell. It is far more readable to see what is going on when an instruction sheet just looks like a list of requests for futures and then routing them somewhere else.
In Lua you have to resort to shenanigans with coroutine yields and in Nim they are just a compile-time macro that tries to replace everything with equivalent code using futures and some kind of resumable state machine.
Saw some preliminary mentions of
async
support in the future Cyber. I just wanted to point out that async is by far not a solution:What color is your function? Asynchronous Everything Why is
async
"slow" and can not be made fast without making itsync
1 Why isasync
"slow" and can not be made fast without making itsync
2Instead I would prefer Cyber exploring the space outlined in Proper support for distributed computing, parallelism and concurrency (yeah, I am obviously biased :wink:).