Open oscgonfer opened 1 year ago
Adding to this topic, a possibility would be to implement simple device metrics, as already suggested here https://github.com/fablabbcn/smartcitizen-api/issues/100#issuecomment-446579722 for those checks that can be done in platform.
A proposal could be to add a health
table linked to the device
which would contain:
health:
# on device data ingestion, calculated by the platform
total_data_points: # number of data points in total
data_gaps: #% of data gaps in the whole period based on sample interval (to retrieve from hardware info?)
missing_sensors: # list of sensors that have been present, but that aren't anymore
# filled from a health topic on the mqtt. JSON directly to allow flexibility
hardware_report: #json sent directly from the hardware
To be done at ingestion time by ahoy or similar library. The kit's firmware will post the intervals for reading and publication on boot
or config
change (TBC), on a /device/<token>/config
topic that would fill a config
table per device
.
This could also provide a metric that represents the variability of the posts interval and raise a flag for a sensor that is not posting data regularly.
The kit's firmware will send data normally, and the platform needs to know what to expect. This is now done by blueprints (kits) but we would like to change this as discussed in https://github.com/fablabbcn/smartcitizen-api/issues/241. This would present a list of sensors to the user, on the onboarding or on the kit edit page (device edit) in which the user can select which sensors are to be expected, and whether or not a notification should be sent in case one of them is not received after a certain threshold has been passed (related to the reading/publication intervals from above).
The user could select notifications in this page, and mark sensors in the front end for misbehaving sensors:
The kit's firmware would post at least these new sensors:
These shouldn't be presented in the frontend to avoid confusion, but would be supporting health diagnosis.
Summary of action points for now:
hardware_info
table and mqtt topic for prototyping metrics coming from hardware directly
This issue is to open the discussion about health metrics for a device. Currently we see some common issues when devices are deployed, such as connectivity issues or hardware problems to name the most common ones. We need an easier debugging process for the users, which can be provided by some metrics and analytics of the data, and ad-hoc physical device metrics.
Initially, we are addressing this issue offline, with custom requests to the API, but down the line, the process should be integrated in the platform for an easier debugging.
To start with this, we suggest adding a property to the device indicating the device
health
, in which we can collect various metrics, some calculated in the physical device side, and some on the platform side. Current proposals:Platform checks
hardware_info
for instanceFirmware checks
This could be sent on a
/device/<token>/health
mqtt topic, and ingested on thehealth
table for later. Could be sentad-hoc
, or on boot:@pral2a @vicobarberan please provide inputs to build it progressively.