Open dcaputo-harmoni opened 1 year ago
Hi, We indeed changed the state manager's object and switched it from uuids to ids as you mentioned to make the implementation slightly more performant and using less space in Redis at the same time. In previous API version the heartbeat mechanism was quite forgiving for such migration errors, but we recently revamped it so that to avoid having the DB & Redis device heartbeat states go out of sync, after which point we changed it fails for such unexpected cases, so that we can notice them if they are occurring.
In general we do suggest upgrading one major at a time and I think in this case it would have helped avoiding this issue.
A way that you can use to get your device state in sync between the DB & Redis after v11.7.0 is setting the API_HEARTBEAT_STATE_ONLINE_UPDATE_CACHE_TIMEOUT
env var to any non-negative integer value (milliseconds).
I do see that you already have a PR for this though, so I guess that you might no longer be in this inconsistent state so the API_HEARTBEAT_STATE_ONLINE_UPDATE_CACHE_TIMEOUT
is only a suggestion in case you face a similar issue in the future.
I'm receiving the message below for a group of devices exactly every 30 seconds when running
open-balena-api
11.8.3. Upgraded from an older version (0.194.0) so am wondering if could be caused by items that existed in thedevice-online-state
redis queue at the time of cutover that were entered using the older api. I believe previously this queue used uuid to identify devices, and that was changed over to id at some point between my last version and the current one. I'm assuming the 30 second frequency is driven by RSMQ_READ_TIMEOUT being set to 30 seconds.And separately, does anyone know if I can address this by just flushing the device state queue in redis? If so, how would I go about doing this? Presumably I could just run FLUSHALL on my redis instance but not sure if that would have any negative effects elsewhere.
And one further note - when I view the
device-online-state:expired:Q
hash in redis I get the following (truncated - but more below with uuid).