Document goals and non-goals of xtra

thomaseizinger commented 2 years ago

In https://github.com/Restioson/xtra/issues/119 and other places, we have discussions that ultimately come down to optimising for different goals.

I am opening this issue to propose and discuss a manifest that states goals and non-goals and rates properties amongst each other.

From the current issue description, xtra wants to be safe, small and performant. Taking inspiration from the agile manifest, we can thus perhaps say that we value:

Safety over performance

Example: We would rather not use unsafe code to achieve a more performant implementation.

Small API surface over performance

Example: We would rather not add a "special" function to the public API that makes certain use cases more efficient if the use case can already be achieved with a different API.

Small API surface over "batteries-included"

Example: We would rather not include a convenience feature that can already be expressed with existing public APIs.

Convenient APIs over ordering-guarentees

Example: async-await style APIs are convenient to use but it is hard to provide ordering guarantees once a task is spawned into an executor. Ordering guarantees more or less imply poll style APIs down to user-handlers but those are less convenient to use.

Orthogonal APIs over additional features

Example: We would rather not add a feature to xtra if it introduces APIs that are not orthogonal to an existing API. In other words, all APIs should be as orthogonal and modular as possible.

Same as with the Agile manifesto, this list doesn't mean that we don't optimise for the items on the right but when in conflict with the left, we will favor the left.

Restioson commented 2 years ago

I definitely agree that we should clarify these, and I agree with most of the points!

Convenient APIs over ordering-guarentees

Could you give an example of when this tradeoff would apply to xtra? I don't quite understand this one.

thomaseizinger commented 2 years ago

I definitely agree that we should clarify these, and I agree with most of the points!

Good to hear!

Convenient APIs over ordering-guarentees

Could you give an example of when this tradeoff would apply to xtra? I don't quite understand this one.

I think this one may need a bit of fleshing out still. I have three thoughts on it:

I think the choice of letting users write their handlers with async-await means we give up some opportunities for the user to strictly control ordering.

Every await call represents a yield point and thus, the exact order in which a fleet of actors will process something is hard to predict. If I use a multi-threaded executor and send a message to two actors - both with split_receiver - then I don't have any guarantees, which one will process the message first.

Ordering guarantees and asynchronous processing are fundamentally incompatible.

split_receiver is essentially saying "please process this message in an asynchronous manner relative to the current task". So even without an async-await style API, by opting into async processing the user accepts a reordering of operations and has to design their system to deal with that.

An actor system is generally not a good fit for systems which rely on strict ordering of operations.

Actors are a great way of structuring a highly concurrent program but it comes with a trade-off: It is hard and maybe next to impossible to provide a guarantee on the order in which the system will process incoming messages. I think the reason is within the combination of concurrency and the strong isolation that actors provide.

I am not quite sure how to best capture this in the manifest. In essence, I want to say: If you need strong ordering guarantees, don't use xtra because it is an actor system built on top of async-await and thus we can't provide strong ordering guarantees. I tried capturing this by saying, "we chose to provide a convenient API over strict ordering guarantee".

The async-await part is important: rust-libp2p is also designed around an actor system but it has a custom runtime where users need to implement a trait (NetworkBehaviour) that has event handlers for receiving messages and a poll function for making progress on its work. This trait represents an actor.

Because of the poll design, the runtime can control precisely, when it will poll an actor, when it will deliver certain messages, etc.

This allows us to make optimisations like preferring messages generated by local code over messages sent from remote nodes (libp2p is a networking library). For example, outgoing connections are handled with priority over incoming ones.

The trade-off here though is that writing those actors is a bit more tedious because one has to write the state machines themselves that an async function would otherwise generate for you.

bobdebuildr commented 2 years ago

Hello! I've been following this project over the past few weeks, and am very interested in using it as soon as 0.6 is stable. What isn't so clear to me at the moment is the main aspects in which this project differs from other available actor frameworks. Perhaps a brief comparison to the most well-known ones would be in order? I didn't open a new issue because this question does seem to be related to your goals and non-goals.

As an example: to me (as somewhat of a noob when it comes to Actors in Rust), the architecture and even certain implementation aspects are very similar to actix (not actix-web), see e.g.: https://actix.rs/book/actix/sec-2-actor.html

Restioson commented 2 years ago

What isn't so clear to me at the moment is the main aspects in which this project differs from other available actor frameworks. Perhaps a brief comparison to the most well-known ones would be in order?

This could be nice to have, yea :)

As an example: to me (as somewhat of a noob when it comes to Actors in Rust), the architecture and even certain implementation aspects are very similar to actix (not actix-web), see e.g.: https://actix.rs/book/actix/sec-2-actor.html

It is similar to actix, yes. In fact, I first wrote xtra as a response to the incident in which the original maintainer of actix removed the repository and said that they would stop developing it. I won't go any further into this as many articles have been written about it already which would summarise it better than I ever could. To make a long story short, through, around this time, the fact that Actix had a lot of unsound (not just unsafe) code was publicised, and when it came to my attention, due to this and the uncertainty around the future of the project, I decided to write my own actor library. As a result, a lot of its features are similar to actix.

As for the differences, while I do believe that the unsoundness has been resolved in actix, xtra is a #![deny(unsafe_code)] crate, meaning that it should never cause unsoundness. It is also rather small, consisting of only around ~2kloc, as we've discussed in this thread earlier on, whereas actix is quite a bit larger at around 5kloc, if I am not mistaken. I haven't been following actix's development since I started xtra, though, so I'm not sure what the main differences are anymore. It would definitely be worth putting some time into finding them and writing them up.

thomaseizinger commented 2 years ago

I'd be keen to put out a blogpost once we release 0.6 which we could also link to TWIR!

Said blogpost could mention a few key points that differentiate xtra from actix for example.

@bobdebuildr If you get a chance to play around with latest master before we release, we'd very much welcome any feedback on the redesigned APIs!

thomaseizinger commented 1 year ago

@Restioson Do you have any thoughts on the above?

If we agree on this, then we can simplify the implementation by removing the separate queue for default priority messages.

Restioson / xtra

Document goals and non-goals of xtra #121