Closed neilh10 closed 7 months ago
I guess I'm not clear is MMW / ODM2 using timescaledb or influxdb . Influxdb is referenced in the architecture diagram outlined 4yrs ago https://github.com/ODM2/ODM2DataSharingPortal/blob/main/doc/ArchitectureDiagram/Data%20Sharing%20Portal%20Architecture%20with%20Logos%20-%20Copy.png
@neilh10, we migrated from InfluxDB to the PosgreSQL Timescale extension (i.e. TimescaleDB) with our v0.12 release in December 2021. We've unfortunately gotten behind on updating all of our documentation to reflect cumulative changes since then (outside of our release notes).
To answer the original question, we're compressing old data into 90-day chunks, that are auto-created every 90 days, with a 1-month buffer. So there is always at least 1 month of uncompressed data. See https://github.com/ODM2/ODM2DataSharingPortal/issues/502#issuecomment-942649040
Regardless, the server-load required to insert historical data should get substantially improved when we implement #674 with our planned v0.18 Milestone.
Closing to track work on this under:
@aufdenkampe thanks for the update. I'm just wondering then what is the impact when a POST is made with a time stamp that is in a range that is compressed.
Of course this issue probably will still exist with #688 , and I wonder what that effect will be of it being queued.
The effect of a post of data into a compressed time chunk is that the server needs to use substantially more resources to insert the data, because the entire chunk needs to get decompressed, appended, then compressed again.
This is only for updates. Reading from a compressed chuck is fast.
Its likely systems in the field stop transmitting readings and then to reconnect at a later date and upload stored readings. #485
It would be optimal for there to be a model for what historical timeframe might be supported.
I'm just wondering what are the current internal settings fort the timescaledb compression are, if any. https://www.timescale.com/features/compression
Examples are
658 - had three systems interrupt Mar 12, and reconnected May 23rd and and when restored had three systems upload data.
Salmon Creek - that had its solar power cable pulled May 15, and restored June 27th and on reconnection transmitted data up https://monitormywatershed.org/sites/TUCA_Sa01/ Mill Creek installed June 12, and stopped a couple of days later https://monitormywatershed.org/sites/TUCA_Mi06/ or very remote Navarro Creek that stopped connecting June 2 https://monitormywatershed.org/sites/TUCA-Na13/
As I understand it, the timescaledb will compress data based on a historical timeframe , and then later if there is a POST to that period it will un-compress the rows associated with the device, then insert the new record. Its likely that the un-compression will take time, and could exceed the 10second timeout that the device will wait for an http 201.
In that case the device will retry on the next connection, and continue retrying till a http 201 is received.
Then I'm guessing then on some schedule, timedb will recompress the data.