Filtering events before sending to a client

clavay commented 4 months ago

Hello,

I am working on a SCADA software using django : PyScada. I want to use SSE to send real time data and historic data to connected clients.

The system get data from various variables from connected devices. The client need to get real time data from a list of variable. Sometime a client is not allowed to get data from every variable and the server can decide to send him the data or not.

My idea of architecture is a broadcast channel where every device send the data of the read variables. I need to have a filter function for each client allowing the server to decide if it send the data or not (depending on the client rights). I was thinking to adding a validation_function to sse_encode_event like this :

def sse_encode_event(event_type, data, event_id=None, escape=False, json_encode=False, validation_function=None):
    if json_encode:
        data = json.dumps(data, cls=DjangoJSONEncoder)
    if escape:
        event_type = build_id_escape(event_type)
        data = build_id_escape(data)
    if validation_function is not None and not validation_function(event_type, data, event_id):
        return ""
    out = "event: %s\n" % event_type
    if event_id:
        out += "id: %s\n" % event_id
    if "\n" in data:
        # Handle multi-line data
        for line in data.split("\n"):
            out += "data: %s\n" % line
        out += "\n"  # At the end pop an additional new line to cap off the data.
    else:
        out += "data: %s\n\n" % data
    return out

And specify this function here and here. The event function should pass this to the stream function in utils.

What do you think about this ? Do you have a better solution ?

jkarneges commented 4 months ago

The standard way to handle this would be to have a channel per variable and subscribe each client to the channels of interest, and then use a channel manager to control access. Is there a reason this wouldn't work? Are there a lot of variables?

clavay commented 4 months ago

I thought about this option, but the amount of variables depends on the use. For my use cases, I can easily have more than 50 variables. That's why I'm thinking of letting the server decide for a broadcast channel which client to send the information to.

jkarneges commented 4 months ago

Hmm, yes 50 channels per connection would be quite a lot. Currently there is a limit of 10.

The main problem with lots of channels are amplification effects on other components, for example when using django-evenstream along with Pushpin chained to a message broker or something. However, if you are only using django-eventstream itself (as you would have to be for this validation function mechanism to work), then I think the effect would merely be a reasonable increase in memory usage within django-evenstream. In that case, 50+ channels per-connection is probably fine. Maybe even 1000 channels per-connection would be fine.

What do you think about simply making the limit configurable and using more channels?

clavay commented 4 months ago

On the client side, there is no effect to keep open 50 or more channels ?

I think it is a better architecture approach to have a broadcast channel where all the devices can send new data and the web server is in charge of the message distribution.

Why don't you like my proposition of a validation function per client ?

jkarneges commented 4 months ago

Client awareness of channels is optional. Normal use of django-eventstream is for the server to select the channels, e.g.:

urlpatterns = [
    ...
    path('/events', include(django_eventstream.urls), {'channels': list_of_50_channels})
    ...
]

The main reason I suggest trying to use channels is it can scale better if you grow to multiple server nodes, but maybe that's not a concern.

I suppose the advantage of the broadcast channel is that it is more dynamic. You could grant a user access to an existing variable or start sending a new variable, and existing client connections could receive the data. Otherwise, clients would have to reconnect to get new channels assigned. I'm open to a PR to that lets the user provide a validation function (maybe call it a filter?). But it shouldn't go in sse_encode_event as that's a serialization function.

clavay commented 4 months ago

Should the filter go here and here ?

jkarneges commented 4 months ago

seems reasonable

fanout / django-eventstream

Filtering events before sending to a client #147