engine: can we emit events straight to the pheonix socket?

josephjclark commented 11 months ago

Something is bothering me.

When the runtime emits an object - a log or a run result - that object gets serialized many times:

it is serialised to a string to be send out of the worker thread into the main thread
Then it is parsed to json in the main thread
Then it is lightly wrapped and converted back to a string to send to lightning via the phoenix socket

As a result of this there's also quite an ugly chain of event mappings between the runtime and the worker. And I think if we have a worker_thread -> child_process -> engine main -> worker architecture, as I'm planning to introduce, the amount of serialisation goes up.

But if the engine connected to the socket directly, we could do less serialisation and less conversion.

Now, there are problems with this. It's a major blurring of the engine and the worker - they both do the same thing, and in effect the engine is coupled to lightning.

The worker is supposed to just be a lightning interface layer, and the engine is supposed to be a generic, long running, multi-threaded (whatever that means) wrapper around the actual runtime.

That's a nice architecture really with a strong separation of concerns. But it may be a little bit too stretched thin, and the cost of serialisation may be too high.

Maybe a better approach is:

The engine is designed to use a websocket
The websocket implementation is pluggable - basically you'd give it a module name and that module is loaded by the deepest child worker
The engine also provides callbacks (or perhaps a module because we may not be able to pass functions through) to convert the data structures
Or even better, the whole eventing layer is abstracted out - the engine sends to a generic event emitter, and a pluggable layer inside the engine listens to events and does what it wants

The worker needs to track the life cycle of the attempt, but it doesn't really need to know all the state objects and stuff. Even better if no state gets load into any shared memory at all. So you have like a lightweight eventing layer which doesn't send any state objects (or log messages) - basically a blind layer which sees events but not their payload (also ideal for tracing and external debugging!) - and a deeper layer which sends full payloads out to lightning.

It's a big change but food for thought.

The engine executes the runtime
It loads a plugin into the working process which listens to all events and gets payloads. I guess basically the plugin is just called with the internal engine/runtime instance and can register whatever hooks it likes
It sends redacted messages out of the working process (who redacts what? Maybe every message has a payload key which is redacted, but the rest gets send out as metadata)
The worker receives these redacted messages in order to manage its own lifecycle
It also registers a plugin to receive full payloads and connect to a lightning socket

josephjclark commented 9 months ago

Stu very wisely suggest that the innermost worker should send the payload as a string, so that it isn't constantly serialised and deserialised

josephjclark commented 9 months ago

In order for this to work, the inner thread will create its own socket to lightning. That itself is fine, but you'd need to send the worker token down into the thread as well as the attempt token in order to connect to the socket. Now there are two JWTs in the sandbox environment.

We should be able to secure those jwts so that a breakout can't get them. But it is nice to be able to say "the worker environment is totally clean and there's nothing sensitive in it"

josephjclark commented 1 month ago

This is back in contention again because we THINK that the main worker thread is being bombarded with socket messages and creating a bottleneck. As capacity increases, more and more messages are being processed by the main worker thread, slowing everything down.

The idea of having the child process speak directly to lightning feels like it'll add huge peformance gains.

But the architecture worries me. The engine is supposed to be generic but now we're asking it to know about lightning. We're blurring the line between the worker and the engine.

Can the worker inject plugins somehow? Can the worker push code to hook to events inside the process? That's all we're talking about - we don't want to change the engine's behaviour, we just want to hook to events inside the engine.

OpenFn / kit

engine: can we emit events straight to the pheonix socket? #544