brianfoshee / aquaponics-data

0 stars 0 forks source link

Time series data storage thoughts #46

Closed brianfoshee closed 8 years ago

brianfoshee commented 9 years ago

@nathanprayzo

I've been thinking some about the types of problems we're been working on lately - nutrient monitoring, weather monitoring, solar energy monitoring - and they all have a very similar underlying data model which is driven by time. We're currently using JSON inside of postgres to store data for nutrient monitoring, but at times I feel like that solution is a bit hacky even though it's certainly doing a good job right now.

I believe that if we can build a layer of abstraction that handles this time series data collection then we can have a system that would scale out for any of our data needs going forward. This could be used and grow with our current projects but also move into newer spaces as we expand more into monitoring.

This layer of abstraction would essentially be a service on top of some type of data storage that would be super fast at inserting data but also super fast at querying and serializing that data for consumption. Postgres JSONB does a good job at storing data but I'm not convinced it's the best long-term solution for potentially having massive amounts of time series data. I've been looking around at some options and I think that InfluxDB is worth looking into. It's an opensource time series database, so it's specifically meant to store data over time. Some more common uses being server metrics and analytics but I think our data needs fit right in. The project itself is written in Go, and has a SQL-ish language to query data which is stored schemaless. It has a JSON API so interfacing wouldn't be too difficult. Each datapoint can contain metadata, which in our case would contain device ID's for nutrient monitoring. Another option would be to explore the possibilities of building a similar time series data storage system on top of Postgres JSONB using the builtin functions for operating on that data type.

Some other projects I've been wanting to look into doing require time-series data collection and if we had a solution in place to basically drop into those projects then I think we could move pretty quickly on different ideas. One idea is to flesh out the solar energy monitoring with a nice dashboard and see what the market might be for doing installs at the residential-level. Another is frontend website load time monitoring, now that browsers implement navigation timing and resource timing APIs to collect information on how long a webpage took to load and render, and how long each resource (css/js/images) took to load and render.

Let me know your thoughts.

brianfoshee commented 8 years ago

Dropping this link in here, found it interesting and would like to look more into it- http://stackoverflow.com/questions/4814167/storing-time-series-data-relational-or-non

brianfoshee commented 8 years ago

Here's a recent article in favor of elasticsearch for time series data https://www.elastic.co/blog/elasticsearch-as-a-time-series-data-store

and a followup discussion about time series data in general https://news.ycombinator.com/item?id=10560635&utm_term=comment

brianfoshee commented 8 years ago

Closing this as of now. I think we learned a good bit. Opening a new issue with graphite info.