Async API for Notifications & Delay-Tolerant Synchronisation

CMCDragonkai commented 2 years ago

Specification

The notifications system is better understood as a message queueing system (https://en.wikipedia.org/wiki/Message_queue). Communications is currently implemented with gRPC, meaning notifications is built in the RPC paradigm.

There are some benefits to choosing a messaging paradigm instead.

All these models make it possible for one software component to affect the behavior of another component over a network. They are different in that RPC- and ORB-based middleware create systems of tightly coupled components, whereas MOM-based systems allow for a loose coupling of components. In an RPC- or ORB-based system, when one procedure calls another, it must wait for the called procedure to return before it can do anything else. In these synchronous messaging models, the middleware functions partly as a super-linker, locating the called procedure on a network and using network services to pass function or method parameters to the procedure and then to return results. https://en.wikipedia.org/wiki/Message-oriented_middleware

It's possible to build a messaging system on top of RPC. Actually RPC and messaging are sort of equivalent, in that RPC can be built on top of a messaging protocol and messaging can be built on top of RPC. See https://github.com/yarpc/yarpc-go as an example of messaging system on top of gRPC.

Right now our architecture has gRPC on top of a lower transport-level network domain. We could build a messaging system on top of gRPC but we would be reimplementing alot of features that other messaging protocols already have.

In addition to notifications, there's also the desire to build lazy gossip protocol that helps us replicate and have eventually consistent gestalt graph as well as ACL state.

We need to think holistically here and ensure that anything we build for an asynchronous messaging-paradigm can be used for all these usecases and constraints:

Notification messaging
ACL & GG synchronisation
CQRS/streaming & consistency to clients (CLI & GUI)
Wide-spread interoperability
Open standards
JS-runtime and potentially usable directly in the browser and native script and mobile platforms
How does pagination come into play here?

A good place to start would be to review the AsyncAPI standard: https://www.asyncapi.com/docs/specifications/v2.2.0#definitionsProtocol. This standard is about describing asynchronous protocols similar to OpenAPI which is used primarily for RESTful or RPCful HTTP APIs.

We should also be aware of how this relates to synchronous transactions that are currently occurring in a blocking manner. If they can be redesigned to be asynchronous, we could improve UX and avoid blocking operations over the network.

Additional context

190
185
166 - HTTP API considerations
155
https://github.com/MatrixAI/js-polykey/issues/213#issuecomment-898203766 - Discussions relating to node claiming process currently being synchronous
243 - deadlines in node claim process
225 - nodes domain architecture will be impacted by any API changes between nodes
https://github.com/asyncapi/spec/issues/64#issuecomment-942915747 - Discussions on protocols used in AsyncAPI
https://news.ycombinator.com/item?id=27825888 - Discussions about SMTP being a widespread messaging protocol

Tasks

...
...
...

CMCDragonkai commented 1 year ago

Our notification system is sort of delay tolerant synchronisation, or human-in-the-loop interactions.

We can think of notifications as an interaction that requires the human to take action before the interaction can complete.

As for being able to "send", it would require an outbox system that retries sending.

This is getting closer to reimplementing email... so we need to ensure we don't feature creep too much here.

Which is why looking into Async API standard is a good idea: https://www.asyncapi.com/docs/concepts/message

Email as a protocol is really insecure and very clunky and old. It's just really hard to automate (wasn't designed to be machine-parseable).

CMCDragonkai commented 1 year ago

QUIC streams will make this cheap to do. We can "pull" to establish a stream, then react to data that is pushed over the stream.

As for delay tolerance, that will require something else. It's sort like a buffer with a retry. A sort of out-box for operations.

It's possible we can re-use our task system to do the retry, and not have to develop domain-specific outboxes.

tegefaulkes commented 1 year ago

For delay tolerant operations we can make use of the task scheduler. If we attempt some RPC operation such as sending a notification and it fails. We can schedule a task to retry it after some delay. If the task fails again we can try again after a longer delay. There should be a threshold for attempts where we give up altogether.

CMCDragonkai commented 1 year ago

Notifications is a queue
RPC is now JSON - it's not asynchronous, but requires other side to be answering
Delay tolerance can work in 3 ways: sender queue, intermediate queue, receiver queue
Sender queue would be the task manager
Intermediate queue - not sure where this would apply atm... primarily in terms of synchronisation, like gossip or vault sync where 2/3 nodes synced up
Receiver queue would be the notifications, notifications may need to be read and actioned out

Therefore I don't think we need any API changes. We're have all the building blocks.

Async API could be useful to standardise our notification-related API.

But right now that's going to be overkill (because also we have a lot of other API related behaviour that has nothing to do with async calls) so I'm going to say wontfix atm.

One could certainly use our JSON RPC API to build the async API.

I think we will have a better idea of what exactly we need when we spec out the interaction between PKE and Polykey as well as Polykey-Desktop, where the CLI is just one interface. And the CLI interface is very batchy, so it doesn't even make use of async API much, but PKE and desktop will do so.

Will revisit when needed. @amydevs

CMCDragonkai commented 1 week ago

This ended up being implemented (to some extent) in #695.

MatrixAI / Polykey