Kelerchian commented 1 year ago

Potential problem

This issue will list some potential problems the current API does not handle.

UB from incorrect event payload schema in storage

A swarm protocol is identified by its name, an arbitrary string.
A swarm protocol contains information on its events.
The schema of the event payload is TypeScript-type only. It can only be parsed when a TypeScript compiler SDK is available.
A swarm protocol's events are identified by 2 tags: "[swarm_name]" and "[swarm_name]:[arbitrary_id]"
2 swarm protocols with the same name can "run" on a single Actyx instance.
Consequently:
- A machine runner can accidentally subscribe to events from a different version of the swarm with the same name. This can cause the state machine in the machine runner to come to an unintended conclusion.
- Event's data-less schema, combined with subscribing to the wrong swarm protocol version can cause the machine runner a runtime crash.

Specific-role projection is not well-formed.

In order to define a correct swarm-protocol and its machine-protocols, these protocols have to be written separately and then cross-examined using machine-check.
These definitions are separately written, but all definitions represent a subset of the same imaginary state machine, the one written as the SwarmProtocol itself.
Furthermore, the well-formedness of the machine protocols is checked individually. It is possible to instead check the well-formedness of the swarm protocol and then derive the machine protocols from it.

Proposed Solution

The first problem is solvable by caveats, versioning how-tos and best practices, and examples.

The second problem is negligible depending on how the developer can afford to sacrifice writability for the sake of easily solving the distributed-state-machine problem.

However, here is an alternative API:

The 3 steps compilation

Swarm Protocol Design: Events and swarm state machine (state label and transitions) are defined here. Event payload schema is dataful and uniquely identifiable. Swarm state machine name and the event payload schema will be summed into a unique identifier for the Swarm Protocol. The format of the event tag will roughly be "[swarm_name]:[swarm_identifier]:[arbitrary_id]". The preceding information are compiled into what we will call from this point forward simply: The "swarm protocol". The swarm protocol includes 1.) the swarm name,2.) the events, and 3.) the unique identifier. The swarm protocol will be used for the later steps.
Machine Protocol Design: This step uses a compilation result of the first one, let it be just the "swarm_protocol". This step defines the 1.) roles involved in the swarm protocol and 2.) for each role, relevant states from the swarm protocol are marked, optionally, payloads are assigned to these states, and commands are assigned to each of these states. A command contains 1.) the command name, and 2.) the chain of events that will be emitted. The preceding information, mapped into the corresponding role, are compiled into several "agent protocol", one for each role.
State Machine Design: From "agent protocol", we can deduce the complete list of "reactions" and "commands". The first action in this step is to "extract" "reactions" and "commands" which are written into a set of definitions that are useful to verify the user's code. Optionally, this extraction method can produce a boilerplate code. The user will then be able to write the details of the "reactions" and "commands" on top of the boilerplate code.

Kelerchian commented 1 year ago

Implementation Detail:

Event schema should use JSON schema instead of zod.
State payload schema should also use JSON schema instead of typescript type.
Developer Experience: zod or io-ts may be used to ease the writing of JSON schema.
[swarm_identifier] uses hash, produced by hashing the JSON schema.
JSON schema hashing should be able to identify equivalent JSON schemas. For example:
- Different ordering of the same set of events should produce the same hash
- { type: "string" } and `{ allOf: [{ type: "string" }] } should produce the same hash
- { "type": "object", "properties": { "a": { "type": "number" }, "b": { "type": "string" } } } and { "type": "object", "properties": { "b": { "type": "string" }, "a": { "type": "number" } } } should produce the same hash

rkuhn commented 1 year ago

Yes, we’re thinking along similar lines here. I’m not yet sure how to best integrate the detailed state computations — without them (and consequently without event payload data) we could condense everything down to the types given in the ECOOP paper. But that’s not expressive enough for real protocols.

One idea here is that we generate the code for the machine definition but with placeholders for the command hooks and event transitions. Forgetting to overwrite one of the placeholders would immediately lead to an exception. This way we can keep the generated code in a separate file — messing with an existing user file is always tricky.

Kelerchian commented 1 year ago

that’s not expressive enough for real protocols.

Not quite sure I understand.

One idea here is that we generate the code for the machine definition but with placeholders for the command hooks and event transitions. Forgetting to overwrite one of the placeholders would immediately lead to an exception.

Thinking the same too.

Kelerchian commented 1 year ago

Anyway, for the current state of machine-runner, with TypeScript and stuff, we can add manual versioning API so that tags that are generated are "[swarmprotocolname]", "[swarmprotocolname]:[version]", and then withId will result in "[swarmprotocolname]:[version]:[id]".

Kelerchian commented 1 year ago

@rkuhn this came up during examining SW's unit test problem: a new API that can be compatible with our current API

const protocol = 
  Protocol
    .build("theprotocolname", Events.all)
    .roles([
      "Manager",
      "Storage"
    ])
    .states([
      "StateA",
      "StateB",
      "Statec",
      "StateD",
      "StateE",
      "StateF",
      "StateG",
      "StateH",
    ])
    .initial((states) => states.StateA)
    //                   ^^^^^^^^^^^^^
    //                   hinted
    .transitions(({
      command, states, roles
    }) => {
      /**
       * List transitions here, the transitions are best written chronologically from the top to bottom
       */
      command(roles.Manager, states.StateA, "commandName", [Events.B], states.stateB)
      //      ^^^^^^^^^^^^^                 ^^^^^^^^^^^ ^^^^^^^^^^^^^
      //      hinted                        constrained hinted
      command(roles.Storage, states.StateB, "commandName", [Events.C], states.stateC)
      command(roles.Manager, states.StateB, "commandName", [Events.D], states.stateD)
      command(roles.Storage, states.StateD, "commandName", [Events.E], states.stateE)
      command(roles.Manager, states.StateC, "commandName", [Events.F], states.stateF)
    })
    /* alternatively, a command can instead take this shape if we want to process the type information
    but the above one is safer */
    .command(({states, roles}) => [roles.Manager, states.statesA, "commandName", [Events.B], states.stateB])
    //                             ^^^^^^^^^^^^^  ^^^^^^^^^^^^^                  ^^^^^^^^^^^ ^^^^^^^^^^^^^
    //                             hinted         hinted                         constrained hinted
    .command(({states, roles}) => [roles.Storage, states.StateB, "commandName", [Events.C], states.stateC])
    .finish()

/**
 * used in machine-check, produce SwarmProtocolType
 */
const protocolAnalysis = protocol.createJSONForAnalysis();

// Role creation
// ===============

const ForManager = protocol.roles.Manager.createProtocol() 

// State Creation
// ===============

const StateAForManager = 
  ForManager
    .states.stateA.design()
    //      ^^^^^^
    //      hinted
    .withPayload<ThePayload>()
    /* probably is not safe from TS version change */
    .commands.commandName.define([Events.B], (ctx, param => [param]))
    /* or alternatively command can take place like this */
    .command(protocol.commands.commandName, [Events.B], (ctx, param) => [param])
    //       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    //       hinted if type is previously registered
    .finish()

// Simpler checks
// NOTE: Throws if somehow this method is called after `checkSwarmProtocol` below is called for this state
ForManager.react(...)

// MACHINE CHECK
// ===============

// Simpler checks
const allRoles = [ForManager, ForStorage] as const

// NOTE: 
// - machine-check knows machine-runner has `createJSONForAnalysis` method. It'll use it to grab the subscriptions.
// - Initial is provided
checkSwarmProtocol(protocol, allRoles)
checkProjection(protocol, allRoles, ForManager)

rkuhn commented 1 year ago

A completely different approach could be to split state payload computation from state transitions: the observer sees the state name and the sequence of events that led to this state (i.e. the unhandled ones are filtered out). Payload computation could then be fully decoupled and independently versioned.

Another (orthogonal) choice would be to use hashing instead of manual versioning, identifying a swarm protocol with the hash of its state machine description. This implies that new instances will not process events written by old instances, which should be fine for many use-cases. Where continuing an old process with new logic is required, a translation scheme like Cambria would be needed, explicitly opting into the processing of old events via a transformation function.

Kelerchian commented 1 year ago

the hash of its state machine description

But this will require the state machine description to be fully written in value, not type (except if we want to include manually using typescript API in the compilation process).

Although, in my opinion, the semantics of event sets and event chains are the ones truly needing versioning, while state payload does not.

rkuhn commented 1 year ago

But this will require the state machine description to be fully written in value, not type (except if we want to include manually using typescript API in the compilation process).

Right, using hashing without splitting the state machine from the payload computation makes this more difficult. Anyway, these are future thoughts, I want to first await real world feedback on our current APIs before starting this.

Actyx / machines

[Brain dump] UB From Dataless Schema, Projection Workflow Duplication, and Proposed Solution #51

Potential problem

UB from incorrect event payload schema in storage

Specific-role projection is not well-formed.

Proposed Solution

The 3 steps compilation

Implementation Detail: