Closed nathanprayzo closed 9 years ago
I think it'd add a lot of overhead when adding devices to have a column for each device, and it wouldn't scale too well. I think sticking with something similar to the following would be alright. That, coupled with a postgres function similar to this to append readings to the json field.
{
"123": {
"ph": 3,
"tds": 120
}
}
{
"reading": {
"created_at": 123,
"sensor_data": {
"ph": 4,
"tds": 120,
"temp": 23,
}
}
}
CREATE EXTENSION "uuid-ossp";
CREATE TABLE device (
-- id is for internal use as a primary key
id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
-- Device generated identifier
identifier character varying NOT NULL,
-- timestamps which set to current time in UTC on record creation
updated_at timestamp without time zone NOT NULL DEFAULT (now() AT TIME ZONE 'UTC'),
created_at timestamp without time zone NOT NULL DEFAULT (now() AT TIME ZONE 'UTC')
);
CREATE TABLE reading (
device_id uuid NOT NULL REFERENCES device (id) ON DELETE CASCADE,
readings json NOT NULL DEFAULT '{}'::json
);
CREATE INDEX reading_device_id ON reading (device_id);
-- SQL function to add a key to json field
CREATE OR REPLACE FUNCTION "json_object_set_key"(
"json" json,
"key_to_set" TEXT,
"value_to_set" anyelement
)
RETURNS json
LANGUAGE sql
IMMUTABLE
STRICT
AS $function$
SELECT COALESCE(
(SELECT ('{' || string_agg(to_json("key") || ':' || "value", ',') || '}')
FROM (SELECT *
FROM json_each("json")
WHERE "key" <> "key_to_set"
UNION ALL
SELECT "key_to_set", to_json("value_to_set")) AS "fields"),
'{}'
)::json
$function$;
I think another option is postgres' HSTORE column type, which is a hash/key-value type. Most of the work on jsonb came out of improvements made to hstore for 9.4. The only limitation is that nested hashes aren't available on hstore until 9.4. It may be worth trying both types out and seeing what sort of flexibility each gives us, and there are always functions in postgres to convert between the two as well. Since postgres 9.4 isn't in RC yet (but I imagine it will be within the next month) we may need to run it on a test server. Heroku is usually good about adding support for the latest versions once they're out of beta though.
Here's the blog of the people that do most of the hstore and json work for postgres http://obartunov.livejournal.com/175235.html.
Edit: My point in suggesting hstore is that I believe it natively supports updating a column with new key/value pairs.
@letsgitgrowing check out this issue on the postgres driver we're using. An example of a custom json encode/decoder. https://github.com/lib/pq/issues/230
I think that we'll need to add a decoder (MarshallJSON) to the MyTime
type to go from database value to serialized in JSON.
@letsgitgrowing this is an interesting project for managing database migrations and schema. Rails and other frameworks typically have a mechanism for managing these kinds of things, and this seems to be a similar approach.
I haven't checked into it this much, but if it had hooks that we could call in our code to setup and teardown the database then we can replace our create table
and drop table
calls in the database tests.
@letsgitgrowing goose is ready to go, at least in development. I'm ready to push it to production to test out - is the data in production necessary or can I drop the tables? I'd like to test it out on heroku before I merge #21.
@crakalakin Regarding the schema for table 'reading', shouldn't the device_id column be a foreign key to device.identifier instead of to device.id?
CREATE TABLE reading ( device_id uuid NOT NULL REFERENCES device (identifier) ON DELETE CASCADE, readings json NOT NULL DEFAULT '{}'::json );
^ Disregard this comment
@crakalakin I have almost finished all of the sub-tasks for this issue but ran into a little snag modifying the handler functions.
The monitoring system is only going to be aware of its device.identifier but in order for its SensorData to be added to the database, the application needs to know the monitoring device's id. As such wouldn't the application need to perform a device.identifier -> device.id look up each time a new reading is added to the database? This seems really inefficient.
Am I missing something?
@letsgitgrowing the way I envisioned it was passing the device identifier through to the model, and then the database can look up the id on update:
UPDATE reading
SET readings = json_object_set_key(readings, '2014-11-05T21:00:00Z', '{"ph":4, "tds":121}'::json)
WHERE device_id = ( -- get device_id using the device.identifier
SELECT id
FROM device
WHERE identifier = 'ABC123'
);
I thought about using the actual device-generated identifier as the primary key for that table, but I would imagine that indexing on it would be pretty huge since it's a long string.
See this comment on the PR for more.
@crakalakin Alright, i'll implement it that way; we can work on optimizing after the system is functional and we have reliable benchmarks.
Thanks!
Sounds good!
@letsgitgrowing try this. The tests pass for me when I use it:
b, er := json.Marshal(r.SensorData)
if er != nil {
return er
}
_, err := m.db.Exec(`
UPDATE reading
SET readings = json_object_set_key(readings, $1, $2::json)
WHERE device_id = (
SELECT id
FROM device
WHERE identifier = 'a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a22'
)`, t, b)
@letsgitgrowing here's my first stab at getting json from the database into a struct:
func TestPostgresAddReading(t *testing.T) {
//...
var s []byte
err = manager.db.QueryRow(`
select to_json(readings->>$1)
from reading
where device_id = (
select id
from device
where identifier = 'ABC123'
)`, now).Scan(&s)
switch {
case err == sql.ErrNoRows:
t.Error("Error no rows in table reading")
case err != nil:
t.Error("Error", err)
}
var data interface{} // still working on making this a models.SensorData type
decoder := json.NewDecoder(bytes.NewBuffer(s))
if err = decoder.Decode(&data); err != nil {
}
fmt.Println(data) // data is a map so data["ph"] etc would work also
//...
}
This prints out:
{"ph":6.8,"tds":120,"water_temperature":78}
Edit: removed uint conversion function in favor of []byte
@letsgitgrowing also, this might come in handy:
select json_object_keys(readings)
from reading
where device_id = (
select id
from device
where identifier = 'ABC123'
);
Returns:
json_object_keys
----------------------
2014-11-05T21:00:00Z
2014-12-01T23:00:00Z
(2 rows)
@letsgitgrowing I found a solution to go from JSON in postgres to a models.SensorData
struct. I'm sure that at some point we can figure out a way to reduce the two different json.Unmarshal
calls into one, but for now it works.
var s []byte
err = manager.db.QueryRow(`
select to_json(readings->>$1)
from reading
where device_id = (
select id
from device
where identifier = 'ABC123'
)`, now).Scan(&s)
switch {
case err == sql.ErrNoRows:
t.Error("Error no rows in table reading")
case err != nil:
t.Error("Error", err)
}
// First unmarshal: unmarshal []byte into a string
var d string
err = json.Unmarshal(s, &d)
if err != nil {
fmt.Println(err)
fmt.Println(string(s))
}
// ... Second unmarshal: unmarshal the above string into SensorData
var bd models.SensorData
json.Unmarshal([]byte(d), &bd)
fmt.Println(bd)
Prints out:
{6.8 120 78}
@letsgitgrowing we now have some really great operators on jsonb
. For instance, the test we have which checks if we inserted readings by selecting the count of that key can be reduced to select readings ? '2014-11-05T21:00:00Z'
, where the ?
operator returns true or false if the key is in the json. It also runs in 0.338ms.
@crakalakin after reading more into jsonb, it appears the main advantages of storing json in its binary form is faster data manipulation and indexing by the database engine. As our current strategy is to do all of the data manipulation client-side, i'm not convinced refactoring to utilize jsonb is a high-priority.
What are your thoughts?
Sources:
@letsgitgrowing I'd agree to hold off on switching the column type until this branch is merged into master, but I think it'd still be a good idea to switch to it eventually (I believe this is all that's required). INSERT
ing may be a bit slower, but because the data is in a binary format it'd be faster to do something like pull all readings from between two timestamps. Also I was seeing much faster SELECT
statements when using jsonb
so pulling the data out may be faster in general.
I think we should move forward with the schema you outlined previously: