Device attributes in webhook json

jpmeijers commented 2 years ago

Summary

I would like to see the device attributes which are set on TTS console to appear in the json I receive via the webhook. This would allow me to have a stateless lambda function process the webhook. Depending on the attributes the lambda can do different things with the data.

Current Situation

Device attributes are not included in the webhook json data. One needs to do an api call to TTS to obtain the device attributes.

Why do we need this? Who uses it, and when?

All device configuration can be moved to a single place: TTS. Depending on the configuration on TTS, a stateless middleware can process the data and send it on to a third party system.

Example: A gps tracker need to be added to ArcGis. The tracker is added to TTS. A lambda funtion creates features on ArcGis. The configuration of the ArcGis api key, feature layer ID, and some other configurations need to be stored somewhere on a per device basis. Using the attributes on TTS would be ideal, as the same lambda function could then be used for any device or any ArcGis account.

Proposed Implementation

Add the device attributes to the webhook uplink (and other) json

Contributing

[ ] I can help by doing more research.
[ ] I can help by implementing the feature after the proposal above is approved.
[X] I can help by testing the feature before it's released.

Code of Conduct

[X] I agree to follow TTN's Community Code of Conduct.

descartes commented 2 years ago

This has been raised before, I believe the challenge is about how much is held in Redis for timely access. Plus there are petitions to reduce the size of the JSON ...

jpmeijers commented 2 years ago

The workaround I'm currently pursuing is:

Create ApiKey for webhook, giving permissions to list application devices.
Uplink happens and Lambda receives uplink json, as well as ApiKey.
Inside the Lambda do a call to /api/v3/applications/{end_device_ids.application_ids.application_id}/devices/{end_device_ids.device_id}?field_mask=name,attributes using the ApiKey.
Now I obtained the Attributes and the device name, and the Lambda can continue processing the data.

This workaround causes more load on both my Lambda as well as TTS, as would have been if the attributes were included in the uplink. However, now I also get the device name, which I anyway needed, and that is also not included in the uplink message.

descartes commented 2 years ago

I have some alpha PHP code to replicate / mirror the standing data of the device so it is locally available. If you are using the AWS Lambda service this db would need to be on another service. It periodically checks timestamps for any changes or additions. This is faster & less load on TTN which is the target audience as some aspire to use TTN as some form of online database.

descartes commented 2 years ago

I also have beta PHP code to take a payload formatted uplink and get it in to a database - again for the same reasons above. It sorts out any additions & changes in PF field names.

johanstokking commented 2 years ago

This is a sliding edge. On one hand, we don't want to put all end device data in each (uplink) event message to save bandwidth and keep everything fast (Application Server doesn't have all the data so queries Identity Server, potentially somewhere else in the world), and on the other hand we want to encourage users to go stateless. Indeed, we are aware that the workaround described above generates more load but is probably better for throughput.

It's also a sliding edge because we already decided to include device registry information (from Identity Server) in uplink messages (the device's location) so we already have the fetching and caching mechanism in place. It's relatively straightforward to extend this with other fields and that wouldn't hit throughput. However, changes in Identity Server are eventually consisent (they're not immediately propagated to Application Servers), we keep adding bandwidth and it's another set of fields becoming part of our eternal API compatibility commitment.

I think that attributes are very handy though, especially when going stateless. They could act as some sort of headers. Device name is more cosmetic. @adriansmares should we add this in ApplicationUp or ApplicationUplink/ApplicationJoinAccept?

BTW, we have API rate limiters in place for the device registry. If the uplink traffic rate goes beyond what the rate limiter permits for API calls from a certain origin, for a certain resource and/or using a certain API key, you'll get errors. So please avoid API calls back to TTS on every uplink message and cache fields.

adriansmares commented 2 years ago

I think that attributes are very handy though, especially when going stateless. They could act as some sort of headers. Device name is more cosmetic. @adriansmares should we add this in ApplicationUp or ApplicationUplink/ApplicationJoinAccept?

The genie is out of the bottle for that one as well, right ? locations, version_ids and network_ids should've been top level fields of ApplicationUp, not of ApplicationUplink.

Some thoughts:

Add the attributes to ApplicationUp. This is on paper the most correct way, since it makes the attributes available for all of the upstream messages. It has the downside of being inconsistent with the fact that the above mentioned fields are part of ApplicationUplink only.
Add the attributes to ApplicationUplink. This is on paper the most consistent way, since the rest of the 'stateless enabling' fields are here too. Not ideal because it requires state on the service side in order to process the other message types (you need to wait for an ApplicationUplink before you have the required information).
Add the stateless fields[1] to all of the messages of ApplicationUp (ApplicationUplink, ApplicationUplinkNormalized, ApplicationJoinAccept, a new binary compatible variant of ApplicationDownlink[2], ApplicationDownlinkFailed, ApplicationInvalidatedDownlinks and ApplicationServiceData). This solution makes any proponent of D.R.Y cry, but at the same time respects the following considerations:
- Existing API consumers are unaffected.
- The 'stateless enabling' fields are available on every ApplicationUp.
- The bandwidth used is equivalent to having them at the top level of ApplicationUp.
- We keep the API consistent.
Add the state fields to ApplicationUp directly, but when the message is ApplicationUplink, do not fill the fields that would be duplicated.
- The API would feel clunky, as it would be weird that the fields are not usable if the message type is ApplicationUplink.
Add the state fields to ApplicationUp directly, full stop.
- Higher bandwidth requirements for no defensible reason.

I think (3) is more reasonable, the rest seem smelly / bad design. At least (3) is defensible in a way.

[1] locations, version_ids, network_ids and attributes. [2] ApplicationDownlink is also used during submission, and it does not make much sense to allow users to submit these fields while pushing a downlink.

jpmeijers commented 2 years ago

Testing my lambda I realised the locations fields are cached, so it does not update immediately in ApplicationUplink. When I fetch the device from the API I get the changes immediately. If I give my lambda to an end user, their expectation would be that things change immediately after they change settings on TTS.

Device name is more cosmetic.

Yes, but I need a mutable identifier which an end user can change. Name would be great for this, but otherwise I could add an extra attribute. But using the name would be less confusing.

So please avoid API calls back to TTS on every uplink message and cache fields.

If I have to cache, I'm not stateless. Any recommendations here? I'm not keen to spin up a whole RDS or similar instance just to cache a few basic values. Maybe there is a simple key value store that I don't know about.

Looking at your discussion it looks like using the API to fetch the device details and attributes is the best way forward. Adding the device attributes to every uplink might be more expensive (bandwidth, computation) than a few people hitting the API on every one of only their uplinks.

adriansmares commented 2 years ago

If I have to cache, I'm not stateless. Any recommendations here? I'm not keen to spin up a whole RDS or similar instance just to cache a few basic values. Maybe there is a simple key value store that I don't know about.

DynamoDB is serverless if you're looking to only pay for what you use.

Looking at your discussion it looks like using the API to fetch the device details and attributes is the best way forward. Adding the device attributes to every uplink might be more expensive (bandwidth, computation) than a few people hitting the API on every one of only their uplinks.

The thing about the (current) eventual consistency behavior is that it is more of a implementation limitation than a hard stop requirement. I will drop some context at the end of this comment on this.

I think we would prefer that people depend on this metadata being in the uplink, because in the future we can make the consistency better, and because we know when updates occur (so we can cache for them).

The metadata-in-uplink story is actually quite old. It starts with the rather innocuous request in https://github.com/TheThingsNetwork/lorawan-stack/issues/2669 to add locations to uplink messages. We added that in https://github.com/TheThingsNetwork/lorawan-stack/pull/3294 . The initial mechanism was rudimentary:

Check if we have the end device locations in cache.
If yes, just fill.
If not, contact IS and get the location.

The limitation in doing it in this way is that when you have a cache miss, you basically block the message processing while contacting the IS. That's not ideal when a round trip to the IS takes 200 ms (transatlantic). Another obvious issue is that this data does not change very often for stationary devices, so you're stuck requesting the same information over, and over, and over....

So in the next iteration we decided that we will do 'asynchronous caching' in https://github.com/TheThingsNetwork/lorawan-stack/pull/4970. Asynchronous caching is a fancy term that basically means that even if we hit the cache, we still contact the IS in the background. This means that we don't have to block while processing uplinks almost never, and that the data is out of date for an interval ranging between 15 minutes and 4 hours (the chance that we refresh the cache increase as the time passes).

This approach still requires us to contact the IS quite often (since we don't know if an update occurred or not), and the refresh period is still not very user friendly. So we would prefer that we turn the story around a bit - what if the IS contacts the AS when the user updates the locations ? We have an issue for that ( https://github.com/TheThingsIndustries/lorawan-stack/issues/3001 ), and we can re-prioritize that if the current eventual consistency interval is too large.

descartes commented 2 years ago

Testing my lambda I realised the locations fields are cached, so it does not update immediately in ApplicationUplink

I believe that was what Johan meant with his phrase "changes in Identity Server are eventually consisent"!

Using the device name as a modifiable field for linking to own pre-existing databases has cropped up several times - mostly due to the creation of device ids that weren't consistent or typos - this is somewhat resolved now that it is easier to delete & re-register a device with its session intact. But having a 'device external id' field would be useful as sometimes the device name has meaning beyound a primary key in some remote database.

There was some angst over the loss of device attributes in the migration but I don't recall the full details of what people were hoping for.

Even if DynamoDB helps keep things stateless, it still needs keeping up to date.

One of the broader challenges is having a single source of truth and trying to stop people using the TTS databases as live data. I have a clunky desktop database that pushes most of the changes to the server but I constantly lose the battle of only allowing read only access to the console. Being able to query for a list of devices that have had standing data changed, rather like the Data Storage, would facilitate reducing server loads when it comes to a resync. Not much of an issue at present with a hundred or so devices, but one day that will hopefully be a whole heap bigger.

The other option is to have a web hook that sends changed fields. That should keep things in sync to a reasonable level but will still need to resync occasionally, just not as often.

johanstokking commented 2 years ago

Some thoughts:

[1..5]

I think (3) is more reasonable, the rest seem smelly / bad design. At least (3) is defensible in a way.

I think (3) is the best solution too. Please triage this issue or add the triage label.

As for cache updates, indeed that is tracked in https://github.com/TheThingsIndustries/lorawan-stack/issues/3001. The idea is that IS notifies AS whenever certain fields change, so AS can update its cache.

KrishnaIyer commented 1 year ago

The implementation plan is clear. I've added this to the pipeline to be picked up when we have capacity.

philbuettner commented 11 months ago

Are there any news regarding this topic? Will this feature be available in 2024? Thank you for any further information! :)

KrishnaIyer commented 10 months ago

We can pick this up sometime in Q2 I think. I'll get back here later in Q1.

theArcher73 commented 5 months ago

Hello,

The topic is not the newest anymore, but I only now have a concrete need for it on the table. I can understand the tension between the requirements to output the attributes in JSON and, on the other hand, to reduce the message payload.

Accordingly, in my opinion, it would not make sense to throw all 10 attributes into the payload of the TTS.

Would it be an option to

a) to limit the max. number of transferable attributes (possibly TTN/TTI-specific) b) define which attributes are output on the basis of a naming convention?

Steffen

theArcher73 commented 5 months ago

sorry, read the plan right now, I think it goes in an good direction

TheThingsNetwork / lorawan-stack