aws-greengrass / aws-greengrass-shadow-manager

A GreengrassV2 Component that provides offline device shadow documents and optional synchronization to the IoT device shadow service.
Apache License 2.0
9 stars 5 forks source link

(docs): JSON document merge behavior during sync not explained in docs #121

Closed AdamVD closed 1 year ago

AdamVD commented 2 years ago

Describe the bug I can't find any information in the main Shadow Manager documentation or the supplementary "Sync local device shadows with AWS IoT Core" document on what the merge behavior is when there are changes to both the local and cloud shadow.

Expected behavior It's not trivial to learn the merging/sync behavior by observation, as you'd need to have a development core set up and then run something like a Python script make local shadow changes. I'd expect a description of how merges are handled in the docs so that I can reference that when designing my system.

Actual behavior Couldn't find anything in the Greengrass docs, so I resorted to scanning over the source code and unit tests in an attempt to understand the behavior. It looks like local reported changes completely override the reported state in the cloud, and desired states are merged. Is that accurate? Are timestamps taken into account for changes to string/int/boolean (leaf node) fields when merging the desired state?

Additional Context My org has found Shadows and the Shadow Manager extremely useful for our local-first smart home system. It's no fully-featured DB, but it's a convenient document store and multi-master sync service together in one convenient and managed package. For any interactions with our smart devices (lights, climate, etc) the desired/reported model fits perfectly: desired state can be updated locally or in the cloud (effectively a request to change a device state) and then changes are reported back once made on the physical device.

However, we have some additional data requirements which don't fit into the desired/reported model such as user-configurable settings on our app. Specifically, our users can create lighting presets which consist of a few data points that we need to be able to do CRUD operations on locally and in the cloud. In the rare event that some data is changed locally and in the cloud before a sync occurs, I want the document to merge and keep the newest version of any conflicts. I was originally going to stick this data only in the reported state, but perhaps desired is what I need to use if reported always overwrites the cloud.

nikkhilmuthye commented 2 years ago

Hi Adam,

Thank you for reporting this. We will update the documentation to reflect the merge behavior in case of a merge conflict.

Currently, whenever there is a merge conflict when syncing shadows in the desired section, Shadow Manager will consider the cloud as the master and overwrite the local shadow document with that information. Whenever there is a merge conflict int he reported section, Shadow Manager will consider the local as the master and overwrite the cloud shadow with that information. This was done with the idea that the cloud will most likely be requesting some change on the device which it will propagate to the device in the desired section and vice versa.

If there is some other use case where the customer needs the merge to happen based on timestamp, we would need more information about it. We can then assess the use case and come up with a solution.

Thanks, Nikkhil

AdamVD commented 2 years ago

Thanks for the details @nikkhilmuthye. I think my implementation will have some issues if merge conflicts aren't resolved with respect to update time. I explained some of our use case in the "Additional Context" section of the OP, but I will reiterate and expand upon it here:

We are developing a local-first smart home solution that allows our app to talk directly to a server in each home when it is reachable, but falls back to a set of cloud services when necessary. The goal is to be effectively zero-downtime when inside the home as opposed to solutions which depend on the cloud and become unavailable on occasion. Our implementation uses Shadows for managing the state of devices and we have built it so that our Home Control API can be deployed in the cloud as well as on home servers and interact with Shadows in the same way. So a request to turn on a light uses the desired state of the Shadow regardless of whether that request originated inside the home or against the cloud. A different component on the home server picks up the desired change and reports the change back to the Shadow once the physical light was updated. Most of the time this process happens quickly, but we must account for a disconnect where the cloud has a desired state change that won't be synchronized to the home until it reconnects.

Now let me explain a scenario that would result in undesired behavior due to how merging works:

The home is currently disconnected from IoT Core, so Shadow syncs aren't occurring. We have a user outside the home (using cloud services) who turns a light to 10% in their app. This updates the desired state of the cloud Shadow with something like {"light": "10"}.

Now let's say that user gets home before the connection is restored and sets the same light to 80%. This updates the desired state of the local Shadow with {"light": "80"}, and is quickly followed by a reported state update once the physical light is changed.

Some time later, the connection to the IoT Core is restored and the Shadow Manager attempts to synchronize that Shadow. It sees the conflict between the values of the "light" key and uses the older cloud shadow value of 10 to overwrite the newer local Shadow value of 80. The physical light then gets updated to 10% without any additional user input, then the user inside the home walks into a wall because it's too dark and sues us. Just kidding on that last bit... but hopefully that illustrates the problem.

Possible Solution

Since we already have timestamps for all of the fields in the Shadow, we could theoretically take the newest values when performing the merge. You'd probably want to do this at the bottom-most fields (leaf-nodes). In the scenario above, 80 is the newer value for the "light" key and so the older value of 10 from the cloud would be dropped. Timestamps only have resolution to the second so you'd have to prioritize either cloud or local when they are the same, but that would be extremely rare in my use case.

Performing merges like that would be a breaking change, so you'd probably need to introduce some kind of mergeStrategy or conflictResolution configuration for Shadow Manager. I'm not sure how you'd be able to handle deleted (null) fields by the same strategy.

saranyailla commented 1 year ago

The shadow document merge conflict behavior is explained in the docs here as requested.

We've also noted down the request to resolve the shadow merge conflicts by timestamp as an enhancement to our component and we'll need to prioritize and work on it accordingly. Thanks.