Open leandro-lucarella-frequenz opened 1 year ago
Additional context: In the experimental branches, this change resulted in a speed up of the datasourcing benchmark from 30k/s to 70k/s samples
I think there are some issues with doing this:
Sender
everywhere. And independently developed libraries will have unnecessary compatibility issues because one expects sync and another one provides async.Making it sync
improves the performance in single-thread contexts, but it will limit our ability to scale beyond a single thread, in the future. So I'm against this change.
I thought the idea was multi-process, not multi-thread? It's an important distinction here..
we won't be able to support inter-process communication through tcp if we switch to sync.
I doubt that statement to be honest. Right now, all send does is put the sample in a queue. If that queue is full, it drops the oldest sample.
Networked or not, this never needs an async
send.
There are various strategies to handle the network part, the most important aspect is whether we need reliability or not. If we do, it needs to be async
of course, otherwise not.
I believe we have both requirements, some need reliability, some don't.
At the moment the data sourcing actor is simply infinitely creating tasks so this is basically just a rather inefficient queue with the only job to call one function which puts the sample.. in another queue.
we won't be able to support inter-process communication through tcp if we switch to sync. I doubt that statement to be honest. Right now, all send does is put the sample in a queue. If that queue is full, it drops the oldest sample.
These are both good points. If we have a send queue when we go networked, then we'll need a sending task that consumes from the queue, because right now is the receiver which consumes from the queue, but when there is a socket in the middle, that won't be the case.
But if we use a queue and we have (sync) send bursts, it will be very hard that the queue gets full and we lose messages, which is probably not what we want. Relegating the buffering to the socket might be a good approach.
But then, when we use TCP, then suddenly there is no more unreliable sending of messages, if we really want to provide some unreliable message sending approach that is more real-time, we should probably use UDP (and UDPs send_to()
is sync).
Maybe the real discussion here, thinking about the long term (i.e. networked transport for channels), is if we want to really provide unreliable messages channels. We certainly will need reliable channels, to send power-setting commands for example. If we want unreliable channels, then I think having a sync send()
still makes sense. The sync / async
should be tied to that IMHO.
Actually, with the removal of Bidirectional
, now we only have 2 types of channels and 1 is sync (Broadcast
) and the other one is async (Anycast
). There is also an issue about having an option to make Broadcast
senders block too.
So I'm not sure if this is feasible or not, unless we have 2 different types of broadcast channels, one that blocks sends and one that doesn't, instead of making it just an option for the same channel.
Updating the issue description to mention this.
What's needed?
The
Sender
class provides anasync send()
method, ~but most (if not all) our senders could currently send synchronously~ [^1]. Beingasync
means that the control flow/CPU could be taken away from the caller, which is very inconvenient for cases where sync is guaranteed.[^1]: Actually, with the removal of
Bidirectional
, now we only have 2 types of channels and 1 is sync (broadcast) and the other one is async (anycast). There is also an issue about having an option to makeBroadcast
senders block too.Proposed solution
Split
Sender
intoSyncSender
andAsyncSender
(probably make them protocols instead of classes).SyncSender
would provide a syncsend()
andAsyncSender
would provide aasync asend()
.Use cases
Resampler
class) can also benefit from this, now the resampling functions need to beasync
becausesend()
isasync
: https://github.com/frequenz-floss/frequenz-sdk-python/blob/ce6d19b869ed26c4df1e08a332f4e4fa4cbfc199/src/frequenz/sdk/timeseries/_resampling.py#L711-L716send()
would drop like now andasend()
would wait until the queue has room for the message.Alternatives and workarounds
Currently the Data sourcing actor is just spawning tasks for sending and assuming they will eventually finish, and successfully, as there is no error handling for the spawned tasks and if they don't end, we'll be leaking dangling tasks forever.
Additional context
AsyncSender
s now, we will probably need them in the future, at least for this issue: #20sync send()
in the Data sourcing actor: