Low-level non-blocking API for integration with third-party event loops #99

Closed zeenix closed 1 year ago

In GitLab by @levans on Sep 15, 2020, 23:02

Some apps may wish to integrate a D-Bus connection into their event loop, but don't use Rust's async/await nor the associated runtimes.

Making zbus useable for such apps mostly implies providing two bits of API:

a non-blocking version of Connection::receive_message, that returns std::io::ErrorKind::WouldBlock if there is nothing to be read rather than blocking waiting for a message to arrive
access of a file descriptor, that the app can then register into its own event loop (backed by epoll or equivalent). I see that Connection already implements AsRawFd, so I suspect this is already possible?

Would that be in-scope for zbus?

In GitLab by @elmarco on Sep 16, 2020, 08:01

a non-blocking version of Connection::receive_message, that returns std::io::ErrorKind::WouldBlock if there is nothing to be read rather than blocking waiting for a message to arrive

Perhaps we should have a Connection::set_nonblocking API.

access of a file descriptor, that the app can then register into its own event loop (backed by epoll or equivalent). I see that Connection already implements AsRawFd, so I suspect this is already possible?

Yes, and that's how I integrate zbus in a loop today, just listen for IN events and call try_handle_next. But I am not sure we want to advertize/support this method too much, as in the future we may have timers or multiple fds (for the server). And also I don't know how this plays with async. But for now, that's working quite fine.

as in the future we may have timers

What kind of timers? The generic timer (as in g_timeout_add) I would leave to async runtimes.

or multiple fds (for the server).

Wouldn't you just use multiple connections for that? We could create some kind of grouping type for Connection as well.

And also I don't know how this plays with async.

That would be my only worry here actually. But wouldn't this allow us to easily integrate with async runtimes as well? @sdroege could use your input here.

In GitLab by @sdroege on Sep 16, 2020, 13:36

@sdroege could use your input here.

What @levans describes can be integrated into any of the futures runtimes apart from async-std, but that's missing public API there that is only privately available.

Having a way to get a fd and non-blocking functions returning WouldBlock should be possible to integrate anywhere. Note that using this API would require unsafe code, so it would really be just the lower integration layer.

Alternative instead of the fd would be to have another way of signalling that the operation can be retried (which is basically what the futures Wakers are, GLib GSources, etc). That could be done via callbacks, channels, etc.

Nonetheless I think having a proper async Futures API on top of zbus would be useful to have. But the above could be used as the lower integration layer on top of which this could be implemented.

What kind of timers? The generic timer (as in g_timeout_add) I would leave to async runtimes.

Yes, all of them have mechanisms for timeouts and there's not much point in replicating this here. And if you work based on the fd... timerfd or the timeout parameter of epoll() is nearby.

Perhaps we should have a Connection::set_nonblocking API.

This is a potential thread-safety issue, be careful with that :) Maybe just allow setting that during construction?

but don't use Rust's async/await

You'll probably have a hard time integrating with other things then but :shrug:

In GitLab by @sdroege on Sep 16, 2020, 13:46

into any of the futures runtimes apart from async-std

Should actually be possible via one of the Unix* impls of FromRawFd nowadays.

@sdroege thanks for the detailed input. :thumbsup: It really helps. Based on your input and discussion so far, I think we should allow this. @levans If you could contribute a merge request for this, that would be super awesome!

Perhaps we should have a Connection::set_nonblocking API.

This is a potential thread-safety issue, be careful with that :) Maybe just allow setting that during construction?

True, let's make it construct only. We'll probably want separate constructor for setting it though, so we don't need to break API.

In GitLab by @levans on Sep 16, 2020, 17:32

but don't use Rust's async/await

You'll probably have a hard time integrating with other things then but :shrug:

Well, the async/await construct of Rust assumes a certain way of processing data, centered on waiting for things you are expecting to arrive (an IO to complete, a response to an RPC call, etc....). In a project such as Smithay, we are instead in a context of quickly reacting to multiple unpredictable sources of events. In that case, I've found callback-based event loop designs to be much more convenient to work with, hence my development of calloop.

@levans If you could contribute a merge request for this, that would be super awesome!

Do I understand correctly that this is just a matter of adding an alternate constructor that sets the socket non-blocking? Or would this have nontrivial interaction with other part of the logic that would require careful consideration?

@levans If you could contribute a merge request for this, that would be super awesome!

Do I understand correctly that this is just a matter of adding an alternate constructor that sets the socket non-blocking? Or would this have nontrivial interaction with other part of the logic that would require careful consideration?

I was actually hoping you'd be able to tell us. :) AFAICT, yes that should be it. I'd suggest you start there and see if it works for you. If it works for you and it doesn't break our tests, I think we're good. I hope you'll also add a new test for this behavior so we don't later break this for you. :)

Tl'dr: Let's try and find out. :)

In GitLab by @sdroege on Sep 16, 2020, 18:54

You'd need to add code for making the underlying socket non-blocking and possibly for handling continuations of read/writes, if a previous partial read/write returned WouldBlock.

In GitLab by @levans on Sep 16, 2020, 20:07

So, after looking a little into zbus code:

Handling partial read & writes will require either to add two buffers to handle them in ConnectionInner, or make some wrapper around UnixStream that handles the buffering. The second option seems cleaner to me, though it'd require some internal refactoring for send_message and receive_message. I have in mind something similar to the BufferedSocket abstraction I have in wayland-rs (but simpler, as D-Bus messages can be parsed without requiring external metadata).

Thinking about this, for the Wayland protocol we always read the socket with the MSG_DONTWAIT option for recvmsg, and just poll it whenever we need to actually block. This is thus a setup where non-blocking is the default, and we only block when we actually want to.

Doing something similar in zbus would allow to easily handle the initial handshake and Connection::call_method (which really require blocking), while keeping to a minimum the codepath differences between blocking/non-blocking.

Though this is becoming a nontrivial refactor, so I'd like to hear your thoughts about that. I guess this could be used as a decent groundwork for async/await integration?

In GitLab by @sdroege on Sep 16, 2020, 21:38

Doing something similar in zbus would allow to easily handle the initial handshake and Connection::call_method (which really require blocking), while keeping to a minimum the codepath differences between blocking/non-blocking.

Why do they really require blocking? Ideally all parts of the API should be non-blocking

In GitLab by @levans on Sep 16, 2020, 21:53

Well, I assume they require blocking, as:

Connection::call_method reads the sockets and stores message in an internal queue until it gets the reply to the method, and returns that. I don't see how this behavior can be preserved without blocking
The initialization methods new_unix_client and new_unix_server need to exchange a few messages with the D-BUS daemon to initialize the connection. Again, that seems difficult to achieve without blocking, if the connection is expected to be initialized once they return.

In GitLab by @sdroege on Sep 16, 2020, 22:08

Sounds like there should be a lower-level API then that only returns some kind of handshake object that is non-blocking and you have to call it whenever its fd is ready until the handshake is done :)

Connection::call_method reads the sockets and stores message in an internal queue until it gets the reply to the method, and returns that. I don't see how this behavior can be preserved without blocking

That's tricky indeed but we'll need a solution for that too. I was going to suggest adding a call_method_async but that would be confusing when we later add real async support (as in asyc/await). I actually imagine that name being a perfect fit for it even.

Connection::call_method reads the sockets and stores message in an internal queue until it gets the reply to the method, and returns that. I don't see how this behavior can be preserved without blocking

That's tricky indeed but we'll need a solution for that too. I was going to suggest adding a call_method_async but that would be confusing when we later add real async support (as in asyc/await). I actually imagine that name being a perfect fit for it even.

Having said that, it's no biggie if sending methods remain blocking for now.

@levans so will be be working on this or attempting to?

In GitLab by @levans on Sep 18, 2020, 15:41

Yes, I can give it a shot over the weekend.

I'd just like to make sure we are on the same page before starting to work on that. So, what I currently have in mind is:

Introduce an internal abstraction over the unix stream that handles buffering (for both in and out data), in a non-blocking-friendly way, that should just handle transparently blocking sockets without needing to buffer anything
Introduce a new set of constructors for creating Connection in non-blocking mode, with an intermediate object representing the handshake as described by @sdroege
Change Connection::call_method so that, when the socket returns EWOULDBLOCK, it uses poll to wait on it until the appropriate message is received (and adjust its documentation to explain that it will block even if the connection is in non-blocking mode)

How does that sound?

@levans Sounds good to me. Just one idea though:

Change Connection::call_method so that, when the socket returns EWOULDBLOCK, it uses poll to wait on it until the appropriate message is received (and adjust its documentation to explain that it will block even if the connection is in non-blocking mode)

I wonder if we can instead return WillBlock error from call_method with the serial number (and any other context needed to resume the same call later) contained in the error?

EDIT: and provide another method to resume the call_method.

In GitLab by @levans on Sep 18, 2020, 17:09

I wonder if we can instead return WillBlock error from call_method with the serial number (and any other context needed to resume the same call later) contained in the error?

Wouldn't that be a breaking change though?

I wonder if we can instead return WillBlock error from call_method with the serial number (and any other context needed to resume the same call later) contained in the error?

Wouldn't that be a breaking change though?

Not really since the user will explicitly create a non-blocking connection. The method already returns a Result so no API break needed either. Unless I'm missing something?

I wonder if we can instead return WillBlock error from call_method with the serial number (and any other context needed to resume the same call later) contained in the error?

Wouldn't that be a breaking change though?

Not really since the user will explicitly create a non-blocking connection. The method already returns a Result so no API break needed either. Unless I'm missing something?

Oh and I think receive_message should be the same. So the API is uniform. If you create a non-blocking connection, the methods behave accordingly and if you didn't, they work in the blocking manner like they currently do.

In GitLab by @levans on Sep 18, 2020, 17:44

Not really since the user will explicitly create a non-blocking connection. The method already returns a Result so no API break needed either. Unless I'm missing something?

Unless I'm missing something, wouldn't that imply adding a new variant to the error type? Or does the WillBlock error you're mentioning, with a field for the serial number, already exist?

Not really since the user will explicitly create a non-blocking connection. The method already returns a Result so no API break needed either. Unless I'm missing something?

Unless I'm missing something, wouldn't that imply adding a new variant to the error type?

Correct. However, now that i think of it, it'll be a bit strange to have error variants specific for these calls w/ different context in them. Maybe better to add separate non-blocking version for both that returns enum types holding either the final result or an in-progress context? But then how should the existing blocking version behave if connection is created to be non-blocking? Hoping @sdroege could guide us here as I'm mostly just making (almost) educated guess here. :)

In GitLab by @sdroege on Sep 18, 2020, 19:31

As hinted on IRC, I would do this a bit different and start by first writing a lower-level API for the "handshake" and "connection" that would then be used to implement the current blocking API and could be used for implementing any kind of non-blocking/async API. The design idea is basically what tungstenite is doing (at its lower-level API).

Some trait S that abstracts away whatever you need from the underlying socket. Having this will make it a lot easier to handle integration into all the async runtimes and probably also into @levans's project as the detail that there's an fd behind it is completely abstracted away and the user of that API has to take care of this instead.
Some ClientHandshake<S> type that is created for starting a connection.
- There would be a fn handshake(&mut self) -> Result<Connection<S>, Error> on that, which the caller would call in a loop. Error could be WouldBlock, in which case the caller can do whatever they have to do and when the socket is ready again it can be called again to continue the connection establishing.
Some Connection<S> type that is created from the above. This would have a couple of functions.
- fn read_message(&mut self) -> Result<Message, Error>, which again would return WouldBlock if nothing can be read like now.
- fn queue_message(&mut self, message: Message) -> Result<(), Error>, which queues up a message or error out in certain circumstances. One could imagine limiting the queue size.
- fn flush_pending(&mut self) -> Result<(), Error>, which would write out messages that are queued up or return WouldBlock if nothing can be written anymore.
- A caller would call the above functions in a loop as needed, for example around epoll() / select() to know whenever a new thing can be read/written from the underlying socket. Your existing sync API should be trivially implementable around that, and you probably want to make use of epoll() or similar in there too to ensure that your operations can time out.
You probably also want some kind of ServerHandshake<S> that would do the server side of the handshake and the caller would wrap the socket/etc it gets from accept() around that.
A Message type, which is an enum representing the DBus messages: method calls, method replies, signals, etc.

Implementing your current API around that should be trivial, implementing integration into async runtimes is trivial (for which you might want to look at tonic and similar RPC/IPC things btw for API ideas, generally some kind of actor-model inspired API would fit well), implementing integration into epoll()-style loops is trivial, unit testing the whole thing is trivial (you can have a mock "socket").

In GitLab by @levans on Sep 18, 2020, 19:46

Hmm, so, the Connection<S> you describe looks a lot like the internal abstraction I had in mind, so there's that. :)

Now, I'm not really sure I get what you have in mind wrt the trait and S type parameter. What other than an fd would be abstracted over? Or is it a way for the runtime to insert its own, runtime-aware, abstraction over the fd?

In GitLab by @sdroege on Sep 18, 2020, 21:59

What other than an fd would be abstracted over?

Anything that can implement the trait. UnixDatagram, your test thingy, something around the async UnixDatagram from tokio/async_std, a TLS connection (want to do DBus over the Internet?), some mock thing used for zbus tests, ...

Most things are not a fd, and fds are very bad abstractions :)

I'm not really sure I get what you have in mind wrt the trait and S type parameter.

The trait would define the operations that you need to have provided from the underlying socket. So I guess a way to send/receive packets and a way to attach fds to packets and retrieving them again.

Thanks so much @sdroege for detailing the needs for this effort. Some comments below:

Some trait S that abstracts away whatever you need from the underlying socket. Having this will make it a lot easier to handle integration into all the async runtimes and probably also into @levans's project as the detail that there's an fd behind it is completely abstracted away and the user of that API has to take care of this instead.

Right, let's call this trait Connectable?

Some Connection<S> type that is created from the above. This would have a couple of functions.

To avoid confusion with Connection, let's call this Connecting or Connector (better name suggestions welcomed of course).

fn read_message(&mut self) -> Result<Message, Error>, which again would return WouldBlock if nothing can be read like now.

Let's keep it consistent and call it receive_message.

You probably also want some kind of ServerHandshake<S> that would do the server side of the handshake and the caller would wrap the socket/etc it gets from accept() around that.

I don't think that's needed right now and probably will never be needed cause we don't have any plans of implementing a dbus broker and in case of p2p, we skip the handshake all together. But no harm in leaving a comment about this, in case someone wants to write a zbus-based broker (zbroker :laughing:) in the future.

A Message type, which is an enum representing the DBus messages: method calls, method replies, signals, etc.

We already have this type. It's not an enum though but Vec<u8> wrapper, which allows for incremental build-up, which should be useful for our non-blocking plans here.

In GitLab by @sdroege on Sep 19, 2020, 18:58

To avoid confusion with Connection, let's call this Connecting or Connector (better name suggestions welcomed of course).

Rust has namespaces ;) I would have all this hidden away in some kind of module that makes it clear that this is not for normal users of the API. Nothing wrong with having two Connection types at different levels of the API. You're going to have yet another one in a future async API anyway.

To avoid confusion with Connection, let's call this Connecting or Connector (better name suggestions welcomed of course).

Rust has namespaces ;) I would have all this hidden away in some kind of module that makes it clear that this is not for normal users of the API. Nothing wrong with having two Connection types at different levels of the API. You're going to have yet another one in a future async API anyway.

True, didn't think of that.

In GitLab by @levans on Sep 19, 2020, 21:33

So, I gave a first shot at an implementation in !169.

It does not yet follow the naming scheme you've discussed here, I just named things as was most convenient to me during drafting. I plan to rename everything at the end, once all deeper structural / API questions are solved.

This got fixed by !169.

dbus2 / zbus-old

Low-level non-blocking API for integration with third-party event loops #99