Closed htdvisser closed 2 years ago
Related to #2887
The most chatty end device events are:
ns.up.data.receive
(which is published for each duplicate)ns.up.data.drop
ns.up.data.process
ns.up.data.forward
as.up.data.receive
as.up.data.forward
ns.down.data.schedule.attempt
ns.down.data.schedule.success
as.packages.storage.up.store
ns.mac.link_adr.request
ns.mac.dev_status.request
ns.mac.new_channel.request
ns.mac.link_adr.answer.accept
ns.mac.rx_timing_setup.request
Most of these are in my opinion not relevant to the application, and therefore don't need to be forwarded to application event streams.
I'm happy with any irrelevant event type that we can omit by default.
Also compare with what events we show by default in the Console:
'as.*.drop',
'as.down.data.forward',
'as.up.location.forward',
'as.up.data.forward',
'as.up.service.forward',
'gs.down.send',
'gs.gateway.connect',
'gs.gateway.disconnect',
'gs.status.receive',
'gs.up.receive',
'js.join.accept',
'js.join.reject',
'ns.mac.*.answer.reject',
I'd like to see how things perform after https://github.com/TheThingsNetwork/lorawan-stack/pull/4831 . If this is still an issue, we can indeed start slashing a bit the NS events.
This is only partly related to #4831. While #4831 improves the performance on the event subscriber side, this issue is trying to improve the performance on the event publisher side, and of the pub/sub system (usually Redis).
My only concern is that this behavior is inconsistent - It's slightly weird that normal I would go to the application Live Data
and see some events only, given that before I saw everything. But I fully agree that an incomplete and inconsistent stream won't help much (maybe only for programmatic consumers), and that this becomes worse with scale.
This means that for applications with thousands of devices the event stream won't be able to keep up, and drop events.
I can't fully find where this occurs in the publishing code - could you please give some pointers here ?
We currently don't actually drop them, which means that the event system will probably just grind to a halt, which would impact everyone instead of just the concerning application.
I also think only the following are relevant to application users, i.e, something that says "success" and something that helps debug "failure". Most of the time, I don't even look at the other events since they don't convey any information to me.
'as.*.drop',
'as.*.forward',
Okay so what about this:
EndDevice CRUD:
end_device.*
ns.end_device.*
as.end_device.*
js.end_device.*
AS Forwarded Uplinks/Downlinks:
as.*.forward
JS Joins:
js.join.accept
Errors and warnings:
*.fail
*.warning
*.drop
*.reject
NS MAC layer:
ns.class.switch.*
I'm going to implement this for the next milestone.
Looking into this now and remembered that the Console relies on certain event types to synchronize "hot data" such as
as.up.*.forward
for Last activity
(Currently not planned to be filtered out)ns.up.data.process
for uplink/downlink countFiltering those out would result in the Console not being able to keep these data points updated. So we either need to find a replacement event or keep these events unfiltered. We already ran into this problem when moving to backend-enabled filters (#4831) in the backend and had to adjust the filtering accordingly. This caused a problem where we now have to show some events in the Console simply because we rely on it internally and we removed hiding of events on the frontend.
Also, keep in mind that changing what data is being displayed can cause confusion/frustration among users if not communicated properly. So we should make sure that users will understand why the event output changed, via the changelog and other communication channels.
as.up.*.forward
events will still be included in the stream of application events. ns.up.data.process
in the stream of application events? It's still included in the stream of device events.
- Do you really need
ns.up.data.process
in the stream of application events? It's still included in the stream of device events.
Yes, we could use the end device event stream to get this info, but it's currently still sourced from the app stream.
Re-assigning since my part is done. What's still TODO is to update the Console accordingly.
Summary
We need to filter which end device events are forwarded to the application event stream.
Why do we need this?
Because currently event streams can't keep up with large applications.
What is already there? What do you see now?
All end device events are forwarded to application event streams. This means that for applications with thousands of devices the event stream won't be able to keep up, and drop events. This makes the event stream pretty useless.
What is missing? What do you want to see?
We can think about a (possibly configurable) set of end device events that get forwarded to the application. My proposal would be to only forward application-layer events (starting with
as.
).Environment
All deployments, but specifically The Things Stack Cloud.
How do you propose to implement this?
Define a list of end device event names that get forwarded to the application.
Can you do this yourself and submit a Pull Request?
Sure, but this first needs some discussion.