Rethink Device, Kit, Component, Sensor

oscgonfer commented 1 year ago

This issue is to discuss about potential changes on the data models we currently use on the API.

Currently

There are two issues:

As explained here there is a limitation in the measurements to sensor to kit relationship, not allowing flexibility on the measurements for an already defined kit. Two proposals are mentioned there
We are growing in terms of possible sensors that can be connected to the kit, and each time we create a new possible combination, a new kit needs to be created. This has already brought us to 44 possible kits, out of which, 18 are variations of the SCK2.1

This is an important topic to improve, because new hardware versions will immediately duplicate those 18 kits just by simply changing the urban board. This will create an unmanageable amount of possibilities that users will need to navigate through from the Advanced kit selection.

Proposal

In a first stage, the proposal would be to change the components relation to be directly with device instead of kit, and modify components to already assess the problematic of measurements or sensors. kit would disappear as it is right now. In this concept, device components would not be created based on a pre-defined blueprint (or kit), but based on the hardware publications directly, which would pass through the readings_controller (or whichever) creating the components for the found sensors on the posts if they exist in the sensors table. New posts, with additional sensors connected, could dynamically be added with this approach. However, further discussion is needed to review what happens with the sensors / measurement issue.

In a second stage, the proposal is to consider the duplicity of sensors in the same device (aka the mux). This would mean we send a uuid of sensor on the hardware (based on sensor model and location) which we need to "parse". This could use a new propery in components which specifies an unique location for each sensor so that it can be uniquely located.

oscgonfer commented 1 year ago

Coming back to this issue as well, this means we can create also endpoints for sensors and measurements

timcowlishaw commented 1 year ago

Hey Oscar, a couple of questions on this, which we can discuss tomorrow:

There's a few important bits of data which would need new homes under this proposal. Firstly, the sensor_map json on the current kits table, which maps sensor ids to named 'keys' for that kit. If we can assume that a given sensor always takes the same key, that can be moved to a new column on sensors. If however a sensor can take a different id based on what device it's attached to - things get more complicated. They could be moved to components but then, where would they be populated from when the component is created? I think in that case we'd need some sort of kit like concept in order provide a template or prototype for these. The equation and reverse_equation columns on component have a similar issue - if these are always the same per sensor, we just move them to sensors, but if not, they'll need to be populated from somewhere when the component is created for a device (meaning, again, something that looks suspiciously like a kit to act as a repository of the appropriate info to use.
Would it cause problems for either us or end users if the ids or uuids of components changed? This is going to be a necessary consequence of this change (as unique components go from being one-per-kit to one-per-device), so it might be worth preparing ourselves for (and possibly announcing to users in the form of a deprecation warning) to manage what might well be a breaking change!

We'll chat it through tomorrow, but just wanted to give you a heads up :-)

thanks!

Tim

timcowlishaw commented 1 year ago

One little update on the above - we do indeed have two sensors which take different equations depending on which kit they're used as part of, so will need to dig into why, or find a way of dealing with this!

sc_dev=# select sensor_id, count(distinct equation), count(distinct reverse_equation) from components group by sensor_id having count(distinct equation) > 1 or count(distinct reverse_equation) > 1;
 sensor_id | count | count
-----------+-------+-------
         7 |     3 |     1
        10 |     2 |     1
(2 rows)

timcowlishaw commented 1 year ago

These are the sensors in question:

sc_dev=# select id, name, description from sensors where id in (7, 10);
 id |    name     |                               description
----+-------------+--------------------------------------------------------------------------
  7 | POM-3044P-R | Electret microphone with envelope follower sound pressure sensor (noise)
 10 | Battery SCK | Custom Circuit

timcowlishaw commented 1 year ago

(notes from chat)

keys: should always be the same. Equations: should always be the same. the cases above are from ancient versions of the boards, we can safely set the sensor equation to NULL and throw away the values. IDs: shouldn't matter, as components are effectively not part of the user facing data model. However, getting rid of 'kits' would be a breaking API change, so we need to think best about how to manage that.

timcowlishaw commented 1 year ago

(more notes) - leave 'kits' with an OPTIONAL relation from devices, for name and description. Move components relation to devices as agreed before.

timcowlishaw commented 1 year ago

Notes from meeting.

Sensor_map needs to be maintained to map between sensors and cassandra timeseries.

However, the key for a sensor varies depending on what kit it is a part of (It shouldn't, but it does - theses aren't actual semantic differences):

irb(main):070:0> Kit.all.map(&:sensor_map).map {|m|
        n = Hash.new {|h, k| h[k] = [] };
        m.each_pair {|k, v| n[v].push(k)};
        n }.reduce(Hash.new {|h, k| h[k] = [] }) { |h, r|
        r.each_pair {|k, v| h[k] += v };
        h; }.map { |k, v| [k, v.uniq] }.to_h.select {|k, v| v.length > 1 }
  Kit Load (1.3ms)  SELECT "kits".* FROM "kits"
=> {5=>["hum", "ext_h"], 4=>["temp", "ext_t"], 10=>["bat", "batt"], 13=>["h", "hum"], 12=>["t", "temp"], 158=>["co2", "scd30_f"], 161=>["ext_h", "scd30_t"], 160=>["ext_t", "scd30_h"], 165=>["pm_pn0.3", "pn_0.3"], 166=>["pm_pn0.5", "pn_0.5"], 167=>["pm_pn1.0", "pn_1.0"], 168=>["pm_pn2.5", "pn_2.5"], 169=>["pm_pn5.0", "pn_5.0"], 170=>["pm_pn10.0", "pn_10.0"]}
irb(main):076:0>

THEREFORE - we allow for an inheritence mechanism where a sensor has a default key which can be overridden on the component for a given device. That way, we can create components on demand for a device when a reading arrives, but also support the existing variation in keys between devices.

That way, the Kit concept can be dropped entirely - the name and description can be thrown away, and the name reconstructed from the device hw_info.

Does this sound about right @oscgonfer ?

oscgonfer commented 1 year ago

It does! Thanks for the detailed notes.

oscgonfer commented 1 year ago

Now that we are working on all of this, it would be good to understand some extra points (and maybe clean them up if suitable). It seems that we have quite a few timestamps that represent similar things in different places, as far as I noted:

Device Object

updated_at: last update of the device, normally by the user, or any post to the device endpoint. In any case, is the ActiveRecord takes care of that) I assume. It would be good to at least document it.
added_at: creation date (same as created_at) by ActiveRecord
last_reading_at: timestamp of the last reading that device has

Device.data

recorded_at: alias of last_reading_at in device (see https://github.com/fablabbcn/smartcitizen-api/blob/master/app/models/device.rb#L59 and https://github.com/fablabbcn/smartcitizen-api/blob/master/app/models/device.rb#L180)
added_at: alias created_at https://github.com/fablabbcn/smartcitizen-api/blob/master/app/models/device.rb#L58 ActiveRecord

Any insight on why so many aliases?

timcowlishaw commented 1 year ago

Hey hey @oscgonfer - I'm close to a first draft PR to go over when we're back at the end of August, but just wanted to make you aware of a small hitch in our plan (it might not be a big deal, but then again, it might be). I've discovered that the order of the columns in the exported CSV of a device's readings is defined by the sensor_map on Kit, so getting rid of Kit will change the column order in downloaded CSVs, unless we give components a position to make them an explicitly ordered list, and preserve the order that they're currently defined in the sensor map.

It seems like it might be an unneeded extra bit of complexity (especially given the original sensor_map order is more or less arbitrary), but I'm thinking that if we've got downstream users who rely on receiving that CSV in a consistent format for further processing, it's probably worth it?

Let me know what you think!

Thanks,

Tim

timcowlishaw commented 1 year ago

Re the dates - that can almost definitely be simplified. created_at and updated_at are rails defaults, and last_reading_at is ours (and is obviously useful). added_at, i have no idea, and at a glance seems to have an inconsistent definition across at least two different places. Perhaps we can get rid of it?

timcowlishaw commented 1 year ago

Hey Oscar!

Please see a draft PR here: https://github.com/fablabbcn/smartcitizen-api/pull/246 - there's a few small bits outstanding that I've detailed in the description, but i think the bulk of it is done and working.

I'm going to be away from next week 'til the 1st of September - I'm not sure when you're back, but if you happen have any questions before then, i may not be checking Slack, so drop me an email (or I guess a comment on here), otherwise we can catch up in September!

Thanks!

Tim

oscgonfer commented 1 year ago

Hey hey @oscgonfer - I'm close to a first draft PR to go over when we're back at the end of August, but just wanted to make you aware of a small hitch in our plan (it might not be a big deal, but then again, it might be). I've discovered that the order of the columns in the exported CSV of a device's readings is defined by the sensor_map on Kit, so getting rid of Kit will change the column order in downloaded CSVs, unless we give components a position to make them an explicitly ordered list, and preserve the order that they're currently defined in the sensor map.

It seems like it might be an unneeded extra bit of complexity (especially given the original sensor_map order is more or less arbitrary), but I'm thinking that if we've got downstream users who rely on receiving that CSV in a consistent format for further processing, it's probably worth it?

Let me know what you think!

Hi there!

So I am writing from the US now, that's why you will see this comment in weird hours... In any case, regarding the CSV, I believe that the order in which the columns are in is not a big deal, as long as they have "identifiers", which they have, normally the data processing would be done elsewhere, like python or R, so I wouldn't worry too much about the order. However, I would say that the current CSV header is not great, and that it would be a good time to unify formats with the CSV data that comes from the SCK. It might be a bit overkill, but users wouldn't need to have two different handlers for the same data, regardless the source. This means that we can adapt to what is done in the firmware, and have a predefined order (which now comes from the order in this file but it will change at some point soon because of this restructuring and be defined, for instance, by the sensor_id).

In addition, we should use ISO8601 for the datetime format.

At least, I would suggest that the header would go like in here, which would imply unifying the short_name and description in the firmware and the platform.

Long story short:

having a predefined order seems like the best solution, but we should review names and csv header structure in the platform side so that everything is a bit more "uniform"
As far as the usage of sensor_map, we can bypass it by simply doing it in alphabetical order for now or any other way that you deem easier, and then we revisit it once we reconsider firmware names
In parallel, ISO8601 datetime format would be great, and make sure we have it all in UTC.

oscgonfer commented 1 year ago

Hey Oscar!

Please see a draft PR here: #246 - there's a few small bits outstanding that I've detailed in the description, but i think the bulk of it is done and working.

I'm going to be away from next week 'til the 1st of September - I'm not sure when you're back, but if you happen have any questions before then, i may not be checking Slack, so drop me an email (or I guess a comment on here), otherwise we can catch up in September!

Thanks!

Tim

Let's talk in September! Have a good summer break!

oscgonfer commented 1 year ago

Notes:

if there is nothing against it (@pral2a) we will remove added_at date and remove the dates recorded_at and added_at from the device.data field. Also we are thinking that it's better to move data to the first level of the tree but we will discuss this before

oscgonfer commented 1 year ago

Discussion now moved to PR. Let's keep it there to avoid duplicity: https://github.com/fablabbcn/smartcitizen-api/pull/246

timcowlishaw commented 6 months ago

Merged!

fablabbcn / smartcitizen-api

Rethink Device, Kit, Component, Sensor #241