Integrate Observables to coordinate push-based Dataflow (with reactive properties)

CMCDragonkai commented 1 year ago

Specification

The EventEmitter is a nodejs native construct. To be more compatible, we should be using Web APIs like EventTarget.

We've started to do this in https://github.com/MatrixAI/Polykey/pull/438#issuecomment-1235041552.

There are some differences:

Our own usage of EventBus is pretty simple. It does however add the ability to emit events asynchronously and return a promise to indicate when it has been received and executed. Making it closer to an observer pattern. I believe we will want to use EventBus in a number of ways:

Default usecase of a synchronous observer pattern (by default, emitting an event equals executing all synchronous listeners before proceeding)
Asynchronous pub/sub (emitting events inside asynchronous code, and expecting handlers to be asynchronous)
Asynchronous observer pattern, emitting events and then waiting on all listeners to complete

Basically the event emitter already has the ability to both synchronous listeners and asynchronous listeners. The event bus just extended event emitter with the ability to emit asynchronously compared to emitting synchronously.

Once we change to using event target, the same exact usecases must still be available.

Still our primary usecase will be to wire up reactivity between the different domains in Polykey without hardcoding them. This represents a push-based API.

Additional context

https://github.com/MatrixAI/Polykey/pull/438#issuecomment-1235041552 - using event target as part of abort signal and also in our tasks system
https://stackoverflow.com/questions/24340892/the-difference-between-a-message-queue-eventbus-and-a-pub-sub?noredirect=1&lq=1
MatrixAI/Polykey#386 - dealing with root key pair changes

Tasks

Create js-events to introduce a Evented decorator that can be used to augment classes with EventTarget functionality, and AbstractEvent to allow a common foundation for our typed events.
Integrate js-events into js-async-init, so that all decorated async-init now automatically supports events and properly dispatches Start, Started, Stop, Stopped, Destroy, Destroyed events in relation to the lifecycle of domain objects.
Identify evented objects/domains that should then have a events.ts file that lists all the typed events with EventX. Such event classes should extend the AbstractEvent.
Apply pull/push architecture to break any mutual dependency that cannot be a: merged and b: factored out.
Research how Observable can then build on top of Evented decorator, and instead be application specific, rather than exposing them across all of the libraries. An Observed decorator could apply to existing Evented decorators wrapping around all of the addEventListener and dispatchEvent methods... or expose property or method decorators that do similar.

CMCDragonkai commented 1 year ago

A functional representation of the eventbus would be "observables". It'd be interesting to compare if one could create observables on the domains and use that concept instead.

CMCDragonkai commented 1 year ago

See: https://stackoverflow.com/a/47214496/582917

Observables are a more refined version of EventTarget. Their primary innovation is that the subscription itself is represented by a first-class object, the Observable, which you can then apply combinators (such as filter, map, etc.) over. They also make the choice to bundle together three signals (conventionally named next, complete, and error) into one, and give these signals special semantics so that the combinators respect them. This is as opposed to EventTarget, where event names have no special semantics (no method of EventTarget cares whether your event is named "complete" vs. "asdf"). EventEmitter in Node has some version of this special-semantics approach where "error" events can crash the process, but that's rather primitive.

Another nice feature of observables over events is that generally only the creator of the observable can cause it to generate those next/error/complete signals. Whereas on EventTarget, anyone can call dispatchEvent(). This separation of responsibilities makes for better code, in my experience.

But in the end, both events and observables are good APIs for pushing occurrences out into the world, to subscribers who can tune in and tune out at any time. I'd say observables are the more modern way to do this, and nicer in some ways, but events are more widespread and well-understood. So if anything was intended to replace events, it'd be observables.

And with that being said, I think I finally understand where rxjs would play a role in PK. (I already like the idea of ixjs too since we already have alot of pull-oriented APIs using async iterables and iterables).

CMCDragonkai commented 1 year ago

I'm going to commandeer this issue to talk about Observable instead. It's a higher level of event target/eventbus. And it could help create a push-based flow around PK, even as actions may be triggered from pull-based flows.

Basically PK will have a push AND pull data flows going on, as most applications do. This flow is like a "circle".

                      Data Flow
        ┌─────────────────────────────────────┐
        │                                     │
┌───────┴───────┐                    ┌────────▼───────┐
│               │                    │                │
│ Desired State ├────────Push────────► Realised State │
│               │                    │                │
└───────┬───────┘                    └────────▲───────┘
        │                                     │
        └─────────────────────────────────────┘
                    Control Flow

                      Data Flow
        ┌─────────────────────────────────────┐
        │                                     │
┌───────┴───────┐                    ┌────────▼───────┐
│               │                    │                │
│ Desired State ├────────Pull────────► Realised State │
│               │                    │                │
└───────▲───────┘                    └────────┬───────┘
        │                                     │
        └─────────────────────────────────────┘
                    Control Flow

CMCDragonkai commented 1 year ago

There are only 2 uses of the event bus atm:

PolykeyAgent.eventSymbols.KeyManager - reacting to root key pair change MatrixAI/Polykey#386
PolykeyAgent.eventSymbols.Proxy - reacting to new connection establishments to your own node and getting NodeGraph to react

So integrating Observable means restructuring how we do push data flow. In particular I'm looking at the ability of having domains producing observable properties (these are like value over time), that allows downstream users to subscribe to their results. I suspect something like BehaviourSubject is what we need for this.

It's important that there's a way to divide observable objects down to smaller observable things. So if there's an observable structure, I want to be able to "divide" it into its observable properties. And I should be able to build up small observable properties to larger observables, to the point that the entire domain would be observable too.

CMCDragonkai commented 1 year ago

Some references regarding observable properties in a class, and the observable class itself:

Atm, it's easy enough to make a property observable. But what if you want the whole object/class to also be observable. This currently seems kind of vague. I asked it here: https://github.com/ReactiveX/rxjs/pull/6157#issuecomment-1269117053

CMCDragonkai commented 1 year ago

It seems that one shouldn't be extending Observable, and I wouldn't have liked to do this anyway.

Therefore it seems that you mostly make the properties observable.

That being said, it's possible to create a "composed" observable similar to rxdb.

For example in rxdb, the entire database can be observed by doing: db.$.subscribe. The same pattern is reused when subscribing to a "collection" then to a "document". Each of these objects gives us .$ property. So you can subscribe the entire DB, then get each collection, subscribe to the collection, then get each document and subscribe to each document.

Therefore it seems that a viable design is something like:

// gives us an observable of the entire keyRing (in this case... it may just be the POJO or it can be the entire class?)
keyRing.$
// literal value
keyRing.nodeId
// observable of the property
keyRing.nodeId$
// observable of the property
keyRing.keyPair$

The $ seems to be a conventional way of representing an observable object.

Then there's a matter of what exactly you're subscribing to. Would keyRing.$ give you instances of KeyRing, and remember this is a singleton object, so one might argue the the same object is returned, and any subscriptions just execute.

However one has to trigger the keyRing.$ based on lower level changes, so you might want to "build up" observables from smaller observables.

CMCDragonkai commented 1 year ago

Some useful discussion about making our classes observable too: https://github.com/garretpremo/rxjs-observed-decorator/issues/10.

Note that I'm also thinking there are going to be 3 kinds of observables here:

At the level of the domain object
At the level of a specific domain property
Stateful things - like being able to subscribe to stateful changes, like you add/update something to a domain, and then simultaneously you can subscribe to changes on that, this is a bit different from properties, since this would be more similar to a data stream, in such a case, we would have to understand, whether you're supposed to provide "history" here, or only just from the point of subscription?

I imagine one shouldn't acquire history from stateful observables, just from this point onwards. If you want to sync the current state, you should have called that first, while acquiring the change stream, and mapping that change stream. There is a sort of a race condition here that should be prevented by ensuring that you don't lose change data while you're acquiring the current state.

This might even help with CQRS subsequently. At the end of the day propagating such a structure to the frontend.

This means we may have decorators for the class, decorators for properties.

What about decorators for "updating" methods? Those are the "origin" of change.

Imagine:

sigChain.getClaims(); // gets you an `AsyncIterable`
sigChain.getClaims$(); // gets you an `Observable`

But you can't just decorate the async generator. Cause you'd be holding a transaction there while you are reading (and you don't really want to do that). Furthermore, that would be a pull based API.

Instead getClaims$() would be derived directly from the methods that update the data. Like addClaim, deleteClaim.... etc.

CMCDragonkai commented 1 year ago

I've done a bit of research into how to use rxjs. I think we can do this now, it's actually not that big of a change. The main thing is to provide decorators that reduce the boilerplate for creating "dynamic"/"reactive" properties, and being able to wire that up to a reaction on the entire domain object.

The other thing is the way it interacts with js-async-init. Because we may have properties that only exist after the object is "running", this means the properties should be guarded by getters that will be decorated to check if we are in the running state.

This means there may be reactive properties that are "static", and reactive properties that are only acquired through the getter.

Subsequently downstream classes can subscribe to them on start and enable the ability for eventually consistent reaction.

The usage of BehaviourSubject makes sense for a "current value" thing. This sort of corresponds to a continuous data. But a bit different because one doesn't actually sample this. It's still a really a signal. See https://github.com/kriskowal/gtor/ (I think rxjs uses slightly different terminology).

The usage of Subject makes sense for anything intended to be a stream of data. Although there is more to it, because replay subject could also be used.

We'll focus on the integration of BehaviourSubject first.

CMCDragonkai commented 1 year ago

I've also noticed that we are still using readable-stream for interactions between Vaults and Git:

»» ~/Projects/Polykey/src
 ♖ ag 'readable-stream'                                                                                        (feature-rpc) pts/2 18:11:53
vaults/VaultManager.ts
23:import { PassThrough } from 'readable-stream';

git/types.ts
1:import type { PassThrough } from 'readable-stream';

git/utils.ts
22:import { PassThrough } from 'readable-stream';

We should also be replacing the readable-stream with the new JS Web Streams as the standard interface across JS runtimes: https://nodejs.org/docs/latest-v16.x/api/webstreams.html.

This should also be applied to the EFS. Although I'm not sure about that since it is mean to "replicate" the behaviour of Node streams (although it may just be fine to use the Node streams instead of readable-stream). The primary reason to use readable-stream was to be browser compatible. Alternatively that can just update to v4 of readable-stream.

But PK itself should be using web streams.

CMCDragonkai commented 1 year ago

Note that rxjs is designed for push-based dataflow. So since our streams/RPC is focused on pull-based dataflow. The expectation is that between processes/agents there will be pull-based flow. But internally a pull-flow can be converted to a push-flow that triggers reaction points in other parts of the application. This should be considered in relation to the RPC deisgn.

CMCDragonkai commented 1 year ago

@tegefaulkes

CMCDragonkai commented 1 year ago

Level 0 abstraction - hardcoded jumps
Level 1 abstraction - callbacks - "hook points" - more flexible as the user can decide how to plug into the control flow of the module
Level 2 abstraction - event emitter/pubsub - usually not type safe, but caller can now dynamically change the the reactive flow diagram
Level 3 abstraction - observable/reactive variables - composable and decomposable and combinators and type safe

As we go to more abstract, we can more flexibility and in particular on level 3 we can some stronger type safety.

One thing that's peculiar about level 3 abstraction is that it unifies the data-flow and control-flow. As in data-flow can be used to trigger control-flow on a higher level.

For example in our experience in retool, when a change occurs in module 1, and that needs to trigger another thing in module 2, it's not possible to use level 0, 1, or 2. Instead one must export a reactive variable from module 1 and pass it into module 2 where module 2 will subscribe to these changes. So the reactive variable itself is dataflow, however the dataflow itself is being used to express logic, if the intention is for module 2 to perform some other logic as part of a larger control flow abstraction. For example module 1 might be writing something to a DB, then needs to trigger module 2 to fetch something from the DB.

Note that the above are general patterns that you can see, the exact implementation may vary and have different trade-offs. But you can see when tends to be expressed with observables, vs something that is expressed with event emitters and pub/sub, vs something that is expressed with injected callbacks executed at hook points, vs something that is just hardcoded jumps.

In our code base we have used level 1, 2 and 3 in varying places. Usually level 1 is used when the module is relatively simple. Level 2 when we need more flexibility. Level 3 hasn't been taken advantage yet in Polykey, and in particular I think this would be relevant moving ahead, and into PKE too. Level 3 is something that would be necessary in networked-scenarios.

If you think about it, webhooks is an example of level 1 and level 2. They are level 1 when you are interacting with them directly, since you basically have to assign callbacks up front, and then there's no dynamic flexibility afterwards. Level 2 when we think about how many different clients/customers may be using the same system to do a webhook. But they definitely aren't level 3. This is because there's no standardised notion of a reactive variable over HTTP, we just have various stream protocols, and they all tend to be bespoke. See how things have transitioned from HTTP-polling, SSE, web sockets... etc, which is quite alot more varied compared to a standard "restful" API.

CMCDragonkai commented 1 year ago

Also one may note level 3 builds on top of level 2 and 1 mechanisms too. So it's important to understand when to use what.

CMCDragonkai commented 1 year ago

We will have both push and pull dataflows here in PK. But PKE might be the first place to start using it in-earnest before converting PK into it.

CMCDragonkai commented 11 months ago

If websockets conversations is possible MatrixAI/js-ws#2 is possible, then push flows from PK can push all the way pass the program boundary and send events out, by initiating calls to the client side. Doable as long as a connection already exists. If connections don't exist, then they can start a connection, but that requires that it's possible for the other side to run a server.

CMCDragonkai commented 11 months ago

Some notes from the experiments in PKE.

Every domain object exposes 2 kinds of things:

Pull-methods
Reactive Properties

The pull methods are any method call that returns a synchronous value or Promise or AsyncIterable. In fact any property is itself a pull method. Properties can be considered nullary functions.

class C {
  prop: T1;
  f(): T2 {
  }
  async g(): Promise<T3> {
  }
  *h(): Generator<T4> {
  }
  async i(): AsyncGenerator<T5> {
  }
}

All of the above are examples of "pull-methods".

This is how we have structured all of our code currently in PK.

The additional of reactive properties enables push-orientation that is "additive" to the architecture without breaking compatibility or changing the "pull-orientation".

Reactive properties are observables, they may be nullary observables or parameterised observables. Although... I am not sure if parameterised observables make sense here.

class C {
  // PULL ORIENTATION
  prop: T1;
  f(): T2 {
  }
  async g(): Promise<T3> {
  }
  *h(): Generator<T4> {
  }
  async i(): AsyncGenerator<T5> {
  }
  // PUSH ORIENTATION
  $: Observable<C>; // special self-referential reactive property, could be considered a sort of `this.$`
  prop$: Observable<T6>;
  j$(): Observable<T7>;
}

You can see here, that it looks pretty similar to the pull methods, but the introduction of the Observable type is enables compositional type-safe push-orientation.

Now other domain objects can by DIed with objects exposing reactive properties to produce their own reactive properties.

When looking at this from the birds eye view, it produces a sort of push-pull architecture.

Pull Oriented excalidraw

Push Oriented excalidraw

This of course exists within a single program boundary. Inter-program boundary push-pull can only be done if the "network IO" supports it. Our RPC in js-rpc should enable this ability even in the situation where clients connect to servers, due the existence of RPC conversations.

Another cool thing is that each object can maintain their own state, or reactive state. Then stateful objects are like lakes where reactive properties are like streams feeding into states. State inside each object is accumulated possibly using various collection structures. It does not impose anything on how to structure the state.

Immutability is also not necessary, it is optional. However when binding to React components, immutability is necessary. Further exploration of this is PKE.

Finally one must be clear that by default program architecture should be pull-oriented. It's the most natural way of programming.

Push-architecture should only ever be introduced if you have the need to have either multiple origins of change, or multiple receivers of change. That's when push-architecture should be introduced. If you don't have this, there is no need for push-orientation.

CMCDragonkai commented 11 months ago

We have observed Observables to be the functional synthesis of all kinds of push-structures, whether they are callbacks, event emitters, pub/sub, streaming, observer pattern. I believe they are currently the SOTA in terms of push-orientation.

RXJS is the most well maintained implementation of Observable in JS land. But because Observable is not native to the language (and it can be argued that it shouldn't be), then it should only be used within the application tier, not the library tier. Libraries can continue to use EventTarget, but applications should then convert such uses of EventTarget into Observable.

Any usage of Observable within a library might need to be encapsulated within to avoid leakage.

Or... if you prefer to focus on observable-only libraries, one can supply plugins that wrap existing libraries to expose Observable.

One day we may expect Observable to become a native construct to a programming language, once the concepts is fully understood and all kinks worked out.

CMCDragonkai commented 11 months ago

Interaction between pull and push.

Pull methods are called to perform side-effects or anything. These may result in errors. Errors are always returned to the caller via exceptions or promise rejections.
Push reactive properties never output exceptions (unless errors is what they are supposed to indicate). Only successful interactions from state transitions should be emitted to reactive properties. This keeps downstream systems listening only for successful completed state transitions.

CMCDragonkai commented 10 months ago

Some ideas here that still need to be summarised. Best events architecture is being iterated on in js-quic, and will then be summarised for usage in other libraries and PK itself.

Afterwards, it still makes sense to stick with Evented instead of applying Observable everywhere, but instead one may augment our js-async-init with Observable later. I can imagine an optional dependency or peer dependency as a plugin...

MatrixAI / Polykey