Deferred retry of hardware_info handling when device not found

timcowlishaw commented 2 months ago

Since we deployed the devices refactor, a slightly subtle bug (#314) has emerged: if a device sends the hardware_info packet before the user has completed registration, the device is not found, the info is not saved, and therefore the device information in the UI is incomplete. This (hopefully, I like to test on staging with a real device) fixes this by relying on Sidekiq's built in error handling functionality to retry the hardware_info message (with backoff) in the event that no device corresponds to the given key. Sidekiq defaults to 25 retries with backoff (corresponding to about 3 weeks of elapsed time between the first and the last try), which should be sufficient: https://docs.gitlab.com/ee/development/sidekiq/.

oscgonfer commented 2 months ago

[x] Retry not only on hw_info, but also on data payloads, BUT, let's keep an eye on the queue size
[x] Check if we have leverage on the retry interval (30s - 10 times, 60s - 10 times...)

timcowlishaw commented 2 months ago

@oscgonfer changes above addressed, I think this probably needs another test on staging (we can look tomorrow, then it should be ready to go!)

timcowlishaw commented 1 month ago

I've tweaked the retry logic to make sure the timeout works with ActiveJob: we'll need to test on staging on thursday though. I haven't deployed for now for a couple of reasons:

1) There are real users relying on the staging forwarding mechanism, and i can't deploy without making a breaking API change to that (the auth stuff we discussed)

2) The queue is full of retries from the bridge (as we suspected would happen) and i think we should clear the queue before deploying with the new logic to make sure they don't all retry at once!

fablabbcn / smartcitizen-api

Deferred retry of hardware_info handling when device not found #317