balena-io / open-balena-api

The core API of openBalena
https://balena.io/open
GNU Affero General Public License v3.0
58 stars 29 forks source link

Rate limiter should not 'just' drop metrics update without letting device know #1006

Open jellyfish-bot opened 2 years ago

jellyfish-bot commented 2 years ago

[fisehara] Fixed with Supervisor release 14.4.1

Rare cases of device metrics not updating are likely related to how ratelimit / throttle the metrics update happens. https://github.com/balena-io/open-balena-api/blob/6dd540d7f5b397089b7f1ca2fe19d9f4aa8864a6/src/features/device-state/routes/state-patch-v2.ts#L217-L225 The metrics get just dropped, but the device will not know about it. When the device succeeds to update the CPU usage, it will not try again and will fail to find cpu usage changes, as it thinks it has reported the significant change to the api already: https://github.com/balena-os/balena-supervisor/blob/790259560ae9cad433f05deb6b1ca40f805545d8/src/api-binder/report.ts#L81-L86

This still does not explain an observation that a manual device state patch failed. Unfortunately there was no way to get it reproduced.

kb2ma commented 1 year ago

balena-os/balena-supervisor#2049 hard codes metrics-only reporting rate to 300 seconds (5 minutes). Metrics also are reported if some other state change occurs before this interval.