MatrixAI / Polykey

Polykey Core Library
https://polykey.com
GNU General Public License v3.0
29 stars 4 forks source link

Async API for Notifications & Delay-Tolerant Synchronisation #248

Closed CMCDragonkai closed 1 year ago

CMCDragonkai commented 2 years ago

Specification

The notifications system is better understood as a message queueing system (https://en.wikipedia.org/wiki/Message_queue). Communications is currently implemented with gRPC, meaning notifications is built in the RPC paradigm.

There are some benefits to choosing a messaging paradigm instead.

All these models make it possible for one software component to affect the behavior of another component over a network. They are different in that RPC- and ORB-based middleware create systems of tightly coupled components, whereas MOM-based systems allow for a loose coupling of components. In an RPC- or ORB-based system, when one procedure calls another, it must wait for the called procedure to return before it can do anything else. In these synchronous messaging models, the middleware functions partly as a super-linker, locating the called procedure on a network and using network services to pass function or method parameters to the procedure and then to return results. https://en.wikipedia.org/wiki/Message-oriented_middleware

It's possible to build a messaging system on top of RPC. Actually RPC and messaging are sort of equivalent, in that RPC can be built on top of a messaging protocol and messaging can be built on top of RPC. See https://github.com/yarpc/yarpc-go as an example of messaging system on top of gRPC.

Right now our architecture has gRPC on top of a lower transport-level network domain. We could build a messaging system on top of gRPC but we would be reimplementing alot of features that other messaging protocols already have.

In addition to notifications, there's also the desire to build lazy gossip protocol that helps us replicate and have eventually consistent gestalt graph as well as ACL state.

We need to think holistically here and ensure that anything we build for an asynchronous messaging-paradigm can be used for all these usecases and constraints:

A good place to start would be to review the AsyncAPI standard: https://www.asyncapi.com/docs/specifications/v2.2.0#definitionsProtocol. This standard is about describing asynchronous protocols similar to OpenAPI which is used primarily for RESTful or RPCful HTTP APIs.

We should also be aware of how this relates to synchronous transactions that are currently occurring in a blocking manner. If they can be redesigned to be asynchronous, we could improve UX and avoid blocking operations over the network.

Additional context

Tasks

  1. ...
  2. ...
  3. ...
CMCDragonkai commented 1 year ago

Our notification system is sort of delay tolerant synchronisation, or human-in-the-loop interactions.

We can think of notifications as an interaction that requires the human to take action before the interaction can complete.

As for being able to "send", it would require an outbox system that retries sending.

This is getting closer to reimplementing email... so we need to ensure we don't feature creep too much here.

Which is why looking into Async API standard is a good idea: https://www.asyncapi.com/docs/concepts/message

Email as a protocol is really insecure and very clunky and old. It's just really hard to automate (wasn't designed to be machine-parseable).

CMCDragonkai commented 1 year ago

QUIC streams will make this cheap to do. We can "pull" to establish a stream, then react to data that is pushed over the stream.

As for delay tolerance, that will require something else. It's sort like a buffer with a retry. A sort of out-box for operations.

It's possible we can re-use our task system to do the retry, and not have to develop domain-specific outboxes.

tegefaulkes commented 1 year ago

For delay tolerant operations we can make use of the task scheduler. If we attempt some RPC operation such as sending a notification and it fails. We can schedule a task to retry it after some delay. If the task fails again we can try again after a longer delay. There should be a threshold for attempts where we give up altogether.

CMCDragonkai commented 1 year ago
  1. Notifications is a queue
  2. RPC is now JSON - it's not asynchronous, but requires other side to be answering
  3. Delay tolerance can work in 3 ways: sender queue, intermediate queue, receiver queue
  4. Sender queue would be the task manager
  5. Intermediate queue - not sure where this would apply atm... primarily in terms of synchronisation, like gossip or vault sync where 2/3 nodes synced up
  6. Receiver queue would be the notifications, notifications may need to be read and actioned out

Therefore I don't think we need any API changes. We're have all the building blocks.

Async API could be useful to standardise our notification-related API.

image

But right now that's going to be overkill (because also we have a lot of other API related behaviour that has nothing to do with async calls) so I'm going to say wontfix atm.

One could certainly use our JSON RPC API to build the async API.

I think we will have a better idea of what exactly we need when we spec out the interaction between PKE and Polykey as well as Polykey-Desktop, where the CLI is just one interface. And the CLI interface is very batchy, so it doesn't even make use of async API much, but PKE and desktop will do so.

Will revisit when needed. @amydevs

CMCDragonkai commented 1 week ago

This ended up being implemented (to some extent) in #695.