Lahja is a generic multi process event bus implementation written in Python 3.6+ to enable lightweight inter-process communication, based on non-blocking asyncio
Currently we use pickle for serialization of events across the bus.
This is not ideal since
it can result in a remote code execution vulnerability
It can be a performance bottleneck.
How can it be fixed.
Let Event implementations specify their own serialization/deserialization.
This means that we'll need a simple message envelope for transmission of messages across the bus as well as a way for multiple endpoints to negotiate their event types so that connected endpoints can reliably communicate about event types.
Here is a simple starter idea for this.
Message envelope is:
DATA_LENGTH | DATA
DATA = EVENT_ID | PAYLOAD
DATA_LENGTH is a 4-byte little endian unsigned integer
EVENT_ID is a 2-byte little endian unsigned integer
PAYLOAD is the raw bytes of the serialized event.
Two endpoints which are connected to each other will need a way to communicate a mapping of EVENT_ID -> EventType. This should probably be a new internal message type.
A standing question is the identifier that and endpoint uses to reference an event class. I have two ideas.
Use a string of the dot separated import path of the class.
Require explicit pre-registration of event classes.
The first makes for simple UX but it might result in some ambiguity as well as maybe not supporting dynamically created classes.
The second ends up with some coordination cost but I think it is my preference. We can probably provide a simple API for doing this that reduces boilerplate and maybe even makes it automatic for common use cases.
When an endpoint sends an event it would construct the payload by looking up the EVENT_ID for the type of the event being sent, call event.serialize().
data = event.serialize()
payload = struct.pack('<I', len(data) + 2) + struct.pack('<H', get_event_id(event)) + data
What is wrong
Currently we use
pickle
for serialization of events across the bus.This is not ideal since
How can it be fixed.
Let
Event
implementations specify their own serialization/deserialization.This means that we'll need a simple message envelope for transmission of messages across the bus as well as a way for multiple endpoints to negotiate their event types so that connected endpoints can reliably communicate about event types.
Here is a simple starter idea for this.
Message envelope is:
DATA_LENGTH
is a 4-byte little endian unsigned integerEVENT_ID
is a 2-byte little endian unsigned integerPAYLOAD
is the raw bytes of the serialized event.Two endpoints which are connected to each other will need a way to communicate a mapping of
EVENT_ID -> EventType
. This should probably be a new internal message type.A standing question is the identifier that and endpoint uses to reference an event class. I have two ideas.
The first makes for simple UX but it might result in some ambiguity as well as maybe not supporting dynamically created classes.
The second ends up with some coordination cost but I think it is my preference. We can probably provide a simple API for doing this that reduces boilerplate and maybe even makes it automatic for common use cases.
When an endpoint sends an event it would construct the payload by looking up the
EVENT_ID
for the type of the event being sent, callevent.serialize()
.When an endpoint receives a message: