Closed oscgonfer closed 6 months ago
Coming back to this issue as well, this means we can create also endpoints for sensors
and measurements
Hey Oscar, a couple of questions on this, which we can discuss tomorrow:
There's a few important bits of data which would need new homes under this proposal. Firstly, the sensor_map
json on the current kits
table, which maps sensor ids to named 'keys' for that kit. If we can assume that a given sensor always takes the same key, that can be moved to a new column on sensors
. If however a sensor can take a different id based on what device it's attached to - things get more complicated. They could be moved to components
but then, where would they be populated from when the component is created? I think in that case we'd need some sort of kit
like concept in order provide a template or prototype for these. The equation
and reverse_equation
columns on component
have a similar issue - if these are always the same per sensor, we just move them to sensors
, but if not, they'll need to be populated from somewhere when the component
is created for a device (meaning, again, something that looks suspiciously like a kit
to act as a repository of the appropriate info to use.
Would it cause problems for either us or end users if the ids
or uuids
of components changed? This is going to be a necessary consequence of this change (as unique components go from being one-per-kit to one-per-device), so it might be worth preparing ourselves for (and possibly announcing to users in the form of a deprecation warning) to manage what might well be a breaking change!
We'll chat it through tomorrow, but just wanted to give you a heads up :-)
thanks!
Tim
One little update on the above - we do indeed have two sensors which take different equations depending on which kit they're used as part of, so will need to dig into why, or find a way of dealing with this!
sc_dev=# select sensor_id, count(distinct equation), count(distinct reverse_equation) from components group by sensor_id having count(distinct equation) > 1 or count(distinct reverse_equation) > 1;
sensor_id | count | count
-----------+-------+-------
7 | 3 | 1
10 | 2 | 1
(2 rows)
These are the sensors in question:
sc_dev=# select id, name, description from sensors where id in (7, 10);
id | name | description
----+-------------+--------------------------------------------------------------------------
7 | POM-3044P-R | Electret microphone with envelope follower sound pressure sensor (noise)
10 | Battery SCK | Custom Circuit
(notes from chat)
keys: should always be the same. Equations: should always be the same. the cases above are from ancient versions of the boards, we can safely set the sensor equation to NULL and throw away the values. IDs: shouldn't matter, as components are effectively not part of the user facing data model. However, getting rid of 'kits' would be a breaking API change, so we need to think best about how to manage that.
(more notes) - leave 'kits' with an OPTIONAL relation from devices, for name and description. Move components relation to devices as agreed before.
Notes from meeting.
Sensor_map needs to be maintained to map between sensors and cassandra timeseries.
However, the key for a sensor varies depending on what kit it is a part of (It shouldn't, but it does - theses aren't actual semantic differences):
irb(main):070:0> Kit.all.map(&:sensor_map).map {|m|
n = Hash.new {|h, k| h[k] = [] };
m.each_pair {|k, v| n[v].push(k)};
n }.reduce(Hash.new {|h, k| h[k] = [] }) { |h, r|
r.each_pair {|k, v| h[k] += v };
h; }.map { |k, v| [k, v.uniq] }.to_h.select {|k, v| v.length > 1 }
Kit Load (1.3ms) SELECT "kits".* FROM "kits"
=> {5=>["hum", "ext_h"], 4=>["temp", "ext_t"], 10=>["bat", "batt"], 13=>["h", "hum"], 12=>["t", "temp"], 158=>["co2", "scd30_f"], 161=>["ext_h", "scd30_t"], 160=>["ext_t", "scd30_h"], 165=>["pm_pn0.3", "pn_0.3"], 166=>["pm_pn0.5", "pn_0.5"], 167=>["pm_pn1.0", "pn_1.0"], 168=>["pm_pn2.5", "pn_2.5"], 169=>["pm_pn5.0", "pn_5.0"], 170=>["pm_pn10.0", "pn_10.0"]}
irb(main):076:0>
THEREFORE - we allow for an inheritence mechanism where a sensor
has a default key
which can be overridden on the component
for a given device. That way, we can create components
on demand for a device when a reading arrives, but also support the existing variation in keys between devices.
That way, the Kit concept can be dropped entirely - the name and description can be thrown away, and the name reconstructed from the device hw_info.
Does this sound about right @oscgonfer ?
It does! Thanks for the detailed notes.
Now that we are working on all of this, it would be good to understand some extra points (and maybe clean them up if suitable). It seems that we have quite a few timestamps that represent similar things in different places, as far as I noted:
Device Object
Device.data
Any insight on why so many aliases?
Hey hey @oscgonfer - I'm close to a first draft PR to go over when we're back at the end of August, but just wanted to make you aware of a small hitch in our plan (it might not be a big deal, but then again, it might be). I've discovered that the order of the columns in the exported CSV of a device's readings is defined by the sensor_map
on Kit, so getting rid of Kit
will change the column order in downloaded CSVs, unless we give components
a position
to make them an explicitly ordered list, and preserve the order that they're currently defined in the sensor map.
It seems like it might be an unneeded extra bit of complexity (especially given the original sensor_map
order is more or less arbitrary), but I'm thinking that if we've got downstream users who rely on receiving that CSV in a consistent format for further processing, it's probably worth it?
Let me know what you think!
Thanks,
Tim
Re the dates - that can almost definitely be simplified. created_at and updated_at are rails defaults, and last_reading_at is ours (and is obviously useful). added_at, i have no idea, and at a glance seems to have an inconsistent definition across at least two different places. Perhaps we can get rid of it?
Hey Oscar!
Please see a draft PR here: https://github.com/fablabbcn/smartcitizen-api/pull/246 - there's a few small bits outstanding that I've detailed in the description, but i think the bulk of it is done and working.
I'm going to be away from next week 'til the 1st of September - I'm not sure when you're back, but if you happen have any questions before then, i may not be checking Slack, so drop me an email (or I guess a comment on here), otherwise we can catch up in September!
Thanks!
Tim
Hey hey @oscgonfer - I'm close to a first draft PR to go over when we're back at the end of August, but just wanted to make you aware of a small hitch in our plan (it might not be a big deal, but then again, it might be). I've discovered that the order of the columns in the exported CSV of a device's readings is defined by the
sensor_map
on Kit, so getting rid ofKit
will change the column order in downloaded CSVs, unless we givecomponents
aposition
to make them an explicitly ordered list, and preserve the order that they're currently defined in the sensor map.It seems like it might be an unneeded extra bit of complexity (especially given the original
sensor_map
order is more or less arbitrary), but I'm thinking that if we've got downstream users who rely on receiving that CSV in a consistent format for further processing, it's probably worth it?Let me know what you think!
Hi there!
So I am writing from the US now, that's why you will see this comment in weird hours...
In any case, regarding the CSV, I believe that the order in which the columns are in is not a big deal, as long as they have "identifiers", which they have, normally the data processing would be done elsewhere, like python or R, so I wouldn't worry too much about the order. However, I would say that the current CSV header
is not great, and that it would be a good time to unify formats with the CSV data that comes from the SCK. It might be a bit overkill, but users wouldn't need to have two different handlers
for the same data, regardless the source. This means that we can adapt to what is done in the firmware, and have a predefined order (which now comes from the order in this file but it will change at some point soon because of this restructuring and be defined, for instance, by the sensor_id
).
In addition, we should use ISO8601 for the datetime format.
At least, I would suggest that the header would go like in here, which would imply unifying the short_name
and description
in the firmware and the platform.
Long story short:
sensor_map
, we can bypass it by simply doing it in alphabetical order for now or any other way that you deem easier, and then we revisit it once we reconsider firmware namesHey Oscar!
Please see a draft PR here: #246 - there's a few small bits outstanding that I've detailed in the description, but i think the bulk of it is done and working.
I'm going to be away from next week 'til the 1st of September - I'm not sure when you're back, but if you happen have any questions before then, i may not be checking Slack, so drop me an email (or I guess a comment on here), otherwise we can catch up in September!
Thanks!
Tim
Let's talk in September! Have a good summer break!
Notes:
added_at
date and remove the dates recorded_at
and added_at
from the device.data
field. Also we are thinking that it's better to move data
to the first level of the tree but we will discuss this beforeDiscussion now moved to PR. Let's keep it there to avoid duplicity: https://github.com/fablabbcn/smartcitizen-api/pull/246
Merged!
This issue is to discuss about potential changes on the data models we currently use on the API.
Currently
There are two issues:
This is an important topic to improve, because new hardware versions will immediately duplicate those 18 kits just by simply changing the urban board. This will create an unmanageable amount of possibilities that users will need to navigate through from the Advanced kit selection.
Proposal
In a first stage, the proposal would be to change the
components
relation to be directly withdevice
instead ofkit
, and modifycomponents
to already assess the problematic ofmeasurements
orsensors
.kit
would disappear as it is right now. In this concept,device
components would not be created based on a pre-defined blueprint (or kit), but based on the hardware publications directly, which would pass through thereadings_controller
(or whichever) creating the components for the found sensors on the posts if they exist in the sensors table. New posts, with additional sensors connected, could dynamically be added with this approach. However, further discussion is needed to review what happens with thesensors
/measurement
issue.In a second stage, the proposal is to consider the duplicity of sensors in the same device (aka
the mux
). This would mean we send a uuid of sensor on the hardware (based on sensor model and location) which we need to "parse". This could use a new propery incomponents
which specifies an uniquelocation
for eachsensor
so that it can be uniquely located.