aws-observability / aws-rum-web

Amazon CloudWatch RUM Web Client
Apache License 2.0
114 stars 65 forks source link

[Feature Request]: offline support #521

Open williazz opened 3 months ago

williazz commented 3 months ago

When network connection is offline, RUM should cache events and dispatch them to the collector when the connection recovers.

williazz commented 3 months ago

One immediate issue is that RUM disables itself when the number of retries has been exhausted due to network failure. Disabling makes sense when there is a client misconfiguration, such as invalid credentials (403) because retries will never succeed. But this poor practice when it's due to network condition because that might recover, in which case RUM is needlessly disabled.

gmayankcse15 commented 3 months ago

Please help to prioritize the fix where RUM disables itself when the number of retries has been exhausted due to network failure or other error codes like 401, 307 etc. RUM should not disable in such scenario and should continue to emit metrics.

williazz commented 3 months ago

We do not want to remove this feature entirely because RUM should disable when we know that retries will never succeed, such as when credentials are invalid.

To solve the immediate problem, we can add configuration such as disableOnFailure: boolean. And we'd want to make sure that this config is extensible for when provide full offline support.

gmayankcse15 commented 3 months ago

sure as an immediate solution the above config will work. Thanks, Also Could you please provide an ETA.

williazz commented 3 months ago

I've merged in the feature to keep RUM alive when dispatch fails. To close out on this issue, we need to figure out an offline strategy that users can opt into.

  1. Offline support should be opt-in because web applications by default are not configured to work offline.
  2. When offline, events should be written to the IndexDB API. LocalStorage is not a good solution because they only support strings, which is not a good solution for appending structured data. Putting events back into the event cache is also not good because that is stored in memory.
  3. When RUM dispatches, it should send events stored in IndexDB and the event cache.

Open question:

  1. When dispatch fails, what should happen to the payload? I recommend we put the events back in the offline cache so long as we did not receive a validation error (400).
  2. RUM events can contain sensitive user data. How should be safeguard against that, especially because RUM could potentially be configured to send "offline events" from previous users and user sessions.