Open MACscr opened 4 years ago
Hello MACscr!
Thank you for your suggestion!
The TimescaleDB addon is basically a PostgreSql Database server with TimescaleDb extentions on top of it.
Depending on what you would like to achieve, you will indeed need extra components. It's a bit like mix and match clothes. Depending on how you want to look, you pick your shirt and shoes. This makes it a little difficult to write a tutorial about how you should dress yourself. Because it's for a good part personal taste.
What I can do is describe my own setup.
First, you need this addon. It will install a timescale-db enabled sql-database for you. Seconds, you need to feed this thing with data. Third, you need ways to retrieve that data and do cool stuff with it.
I imagine you already did step 1. And so, now you need some data flowing in..
What I did was: 1- Instruct the recorder to send all it's data to Timescale, so we gain performance in parts of home-assistant like the history browser, and other chars. For this, you simple need to point your recorder-connection-string to timescale-db. This is done by simply using this connection-string for the recorder: postgresql://homeassistant:yourpasswordhere@77b2833f-timescaledb/homeassistant
This will store all data of home-assistant into your timescaledb instance.
Homeassistant will use thse tables, but you can also query that data yourself.
2- Use LTSS for long-term storage of statistics data. For this, please see the documentation of LTSS. You can use the exact same connectionstring as in the previous step. This will feed data about sensor-changes into the LTSS table:
Now you have data, you can use it to create graphics. For example about the charging state of my car. I use Grafana for this.
My dashboard looks like this:
The query behind the the top-left chars for example:
Please use the documentation of Grafana about how to create charts, and you need some knowledge of SQL of course. Th extra column 'state_numeric' was a computed column created by me, to aid charts that work on numeric states. You can create this computed column using this script:
CREATE OR REPLACE FUNCTION public.isnumeric(
text)
RETURNS boolean
LANGUAGE 'plpgsql'
COST 100
IMMUTABLE STRICT PARALLEL UNSAFE
AS $BODY$
DECLARE x NUMERIC;
BEGIN
x = $1::NUMERIC;
RETURN TRUE;
EXCEPTION WHEN others THEN
RETURN FALSE;
END;
$BODY$;
ALTER FUNCTION public.isnumeric(text)
OWNER TO postgres;
and then
ALTER TABLE public.ltss
ADD COLUMN state_numeric double precision GENERATED ALWAYS AS (
CASE
WHEN isnumeric((state)::text) THEN (state)::double precision
ELSE NULL::double precision
END) STORED;
I coupled Grafana to Timescale with the following config:
Please note that I have created a special grafana user here, for security reasons. You have to create users yourself using the PostgreSQL documentation.
This is only one of many usescases for timescaledb, but it's a real-world example. It should give you a starting point, from which you can dig deeper into your data and create nice dashoards or alerting.
Does this at least partially answer your question?
To save storage space, you basicially have 3 options:
1) Compress the data. You have just proven that it even can beat Influx.
Pros:
Very easy Instant space savings. Cons:
Compressed data cannot be updated. Normally not a problem (because Influx can not update at all!), but should you have the need, you must first decompress the chuck to update, and after update re-compress again. Little bit of overhead (depending on the type of query, some are faster!) when querying compressed data. 2) Remove data you donβt need (like wiping the attributes column to NULL)
Pros:
Space saving Easy to do Cons:
Should you need that data in the future, bad luck. 3) Downsample your data, using Continuous Aggregates 6
Pros:
Saves a LOT of space. Will speed op queries over large timespans tremendously. Cons:
No more access to all measurements, but only the chosen aggregates values, like min/max/avg/mean within the chosen timeframe (like 5 minute/hour/day/month/etc). You have to overthink your data-needs upfront, because you basically are replacing a large set of old data with 1 record containing aggregated values. There is no way you can get your original data back if you deleted it to save space. Now, you could even combine all 3 methods in a mix and match style if you like, depending on your hunger for data. Maybe start with compression first, and create some continues aggregates to speed up queries spanning thousands of record. That way, you have a nice balance between performance, and not losing any data.
With that in place, you could start with dropping old chunks of data to save space if needed. Clearing the attributes column is more difficult with compression in place, but also
The good news: lots of options, freedom everywhere :slight_smile:
Thanks so much for the detailed response! Ah, I really have no interest in using grafana at the moment (could easily change), but I do like the ability to be able to do queries from home assistant though and having https://www.home-assistant.io/integrations/sql/ does allow me to create some custom stuff. I really wish the awesome mini graph cards allowed us to supply exactly what we wanted it to graph versus relying on entity histories, etc. LTSS is really only useful for fetching from something like grafana or manual queries, right?
LTSS is almost the same as the recorder. In fact LTSS was created by looking at the recorder's code.
The recorder is used for ALL state state that has to be saved, and it's history, including events in the system, and switch states andsoforth. LTSS can be (but does not have to be ) used alongside the recorder to focus a little bit more on long-term sensor-trends like temperature charts or voltage measurements etc. It's created to me more in line with Timescale.
A simple 'get me all outside temperatures for today'-query can be satisfied by both. But.. the recorder cleans up after x days, LTSS won't (because you don't want it to delete your precious data).
" I really wish the awesome mini graph cards allowed us to supply exactly what we wanted it to graph versus relying on entity histories" -Me too! A simple SQL-Graph card.
"LTSS is really only useful for fetching from something like grafana or manual queries, right?" Correct. It's just a bridge between HA and the Database, nothing more. It will get your data in, but not out. You have query it out yourself.
This is really useful, thank you! Consider adding it to the README.md.
@Expaso Thank you for the detailed description which has given me some food for thought. However, I still have a few questions to understand :)
What is the advantage of the additional numeric column? I see that this is needed to use "Format as Time Series" in Grafana. But I don't understand yet what is the advantage compared to "Format as Table"?
Doesn't the additional column also lead to more needed memory? (Are the numeric states stored twice? Once in state and once in state_nummeric).
And: Does your script transfer the past states or only the future state into the new column state_nummeric? (Sorry I am not so good in SQL yet).
Hi @bingo6 !
You're very welcome! I will try to answer your questions one by one:
"What is the advantage of the additional numeric column? I see that this is needed to use "Format as Time Series" in Grafana. But I don't understand yet what is the advantage compared to "Format as Table"?"
Home Assistant works with states. A state can be anything. A temperature, the day of the week, the location of your phone etc. Because of this, the data is stored in a text-column, so any value can be inserted. Now when you come to the point that you want to create charts for a specific sensor, say you want to plot the average temperature of your livingroom, you would first need to convert the state to a number, so you can do math on it (calculating the average). You could convert this column to a number on the fly in your query, but this comes with a cost in the form of CPU.
Because I'm running Timescale on a raspberry pi 4 with a SSD attached, I chose to convert the state to a numeric value upfront, during insert if the records using this calculated column. This will introduce a little bit of CPU overhead during insert, but will enable queries to be much much faster. Also, the queries themselves can be simpler because of the missing extra conversion needed.
_"Doesn't the additional column also lead to more needed memory? (Are the numeric states stored twice? Once in state and once in statenummeric)."
Yes. This is indeed true. The calculated column is an extra column which has to be stored, and so you need more storage space. Since I'm running of an SSD, I have plenty of space so no probs there. Storage is cheap, I rather have a snappy dashboard with near-instant performance. That said, the extra storage overhead is not large. It's only a few bytes per state extra. In fact, if you should choose to shorten the name of your entities by 1 character in home assistant, you are already making up for the extra space needed because of the savings in the attributes column π If you are really concerned about space, you could also choose to wipe the attributes column altogether (after insert, with a trigger for example, if you are not using it). It will save your more then 50% of the total size!
So, big gain, little loss.
_"And: Does your script transfer the past states or only the future state into the new column statenummeric? (Sorry I am not so good in SQL yet). "
All states, past and future. The generated column is computed from the 'state' field. If you would update or insert that field (old or new data), the state_numeric will follow that change immediately.
Thank you for the detailed answer. It convinced me completely, so I also added the column :)
Just by compressing the data, storage space is also no longer the problem and as you had already pointed out, there are other options to reduce.
I also think that adding the hints to the README.md would be useful.
@Expaso Can you please point out where to change this timeframe? I tried to find it, but couldn't. "No more access to all measurements, but only the chosen aggregates values, like min/max/avg/mean within the chosen timeframe (like 5 minute/hour/day/month/etc)."
Hi @jli113 ,
The timeframes are chosen by you. It's how you setup your continues aggregates. I have created 2 myself:
ltss_hourly and ltss_daily.
With both I have enough data to balance detail vs performance.
For these in-depth usage questions, it's best that I refer to the excellent documentation of Timescale-db itself:
It's key to understand these principles to get the most out of timescale.
Thanks~
To actuallly get the TimescaleDB advantage do we have to manually convert the events, ltss, recorder_runs and states tables to hypertables ?
Hi @Davincible,
Home assistant tables like recorder_runs and states are just normal tables. I would suggest to leave them like that, because the HA recorder deletes data from those tables also.
That's why the LTSS integration is there: To store data for longer time.
The LTSS table therefore is created as a hypertable automatically by the LTSS integration, so you don't have to convert it manually.
@Expaso , can you share your ltss_hourly and ltss_daily materialized view ?
@Expaso , can you share your ltss_hourly and ltss_daily materialized view ?
Sure! Please read also: https://docs.timescale.com/api/latest/continuous-aggregates/create_materialized_view/
CREATE MATERIALIZED VIEW ltss_summary_hourly( entity_id, bucket, state_avg, state_max, state_min, state_count)
WITH (timescaledb.continuous) AS
SELECT ltss.entity_id,
time_bucket('1h'::interval, ltss."time") AS bucket,
avg(ltss.state_numeric) AS state_avg,
max(ltss.state_numeric) AS state_max,
min(ltss.state_numeric) AS state_min,
count(ltss.state_numeric) AS state_count
FROM ltss
GROUP BY ltss.entity_id, (time_bucket('1h'::interval, ltss."time"));
and
CREATE MATERIALIZED VIEW ltss_summary_daily( entity_id, bucket, state_avg, state_max, state_min, state_count)
WITH (timescaledb.continuous) AS
SELECT ltss.entity_id,
time_bucket('1d'::interval, ltss."time") AS bucket,
avg(ltss.state_numeric) AS state_avg,
max(ltss.state_numeric) AS state_max,
min(ltss.state_numeric) AS state_min,
count(ltss.state_numeric) AS state_count
FROM ltss
GROUP BY ltss.entity_id, (time_bucket('1d'::interval, ltss."time"));
would be nice to have an example dashboard to import π
@Expaso
I just discovered this very interesting project :+1:.
I was about to convert my SQLITE Database to PostgreSQL, but then I thought ... Maybe I should just convert it to TimescaleDB-HA instead.
Just to clarify though ...
With this integration, are there 2 Databases being used:
Or is it only one TimescaleDB-HA for everything ?
Thanks in advance for the Clarification :+1: .
Hi @luckylinux ,
Basically, this addon adds a PostgreSql server to your setup, including TimescaleDb extensions. Now, HA is using SQLLite by default, which is fine. It uses that database for the recorder component, and statistics. all the graphs and historical data you see in HA are coming from this database.
Now, the recorder component does not store your data indefinitely. It has a set horizon of a few days in most cases, in which old data is being discarded.
The first logical step is to move this recorder database over from SQLLite, to this addon's Postgresql instance. Running SQLLite AND Postgres is just a waste of resources. But.. moving it to this addon also opens op more possibilities to query around your data. It makes dashboarding and monitoring with 3rd party tools like Grafana or Prometheus easier.
Now, since one of the goals of this addon is to also provide a means of storing historical data, we need more then 3 days worth of data, right? That's were the second database comes in. This database is fed by the LTSS component. this component basically is a clone of the recorder component, but is TimescaleDb aware, and does not remove old data.
Using Timescale's hypertables, you can store vast amounts of data, without much pain. For instance, I have never had to delete ANY data since I created this component a few years ago, and query happily around. The database is like 170GB or so. This is WITH Timescale compression though, but that is transparent.
So having both databases there is just logical to do. They somewhat store overlapping data for a few days, but use it for different means. Your HA database is for operational HA things, and the Timescaledb database for statistical analysis.
@Expaso
Thank you for your reply.
But since it's a "addon" I guess it's not really applicable to my case, where I am running homeassistant
myself under Podman (Docker). I'm NOT using Hassio ...
So basically you are telling me that I should create/have 2 databases servers/containers in the end:
homeassistant-timescaledb-main
(converted from home-assistant_v2.db
- Conversion is not so much pain-free - I'm trying to get all the learnings I could find from the NET into one Migration Script -> https://github.com/luckylinux/migrate-sql-database)homeassistant-timescaledb-ltss
for long term storageOn top of that I also have a homeassistant-server
container of Course.
Does this sounds correct ?
@luckylinux , almost..
and ofcourse on top of that the homeassistant-server
container.
@Expaso
Is there some difference by actually running the container standalone (e.g. docker pull husselhans/hassos-addon-timescaledb-amd64:latest) compared to what I'm doing (e.g. docker pull timescale/timescaledb-ha:pg16-all), besides the fact that I will have 2 containers running (which IMHO could be useful for maintenance) ?
I mean ... Maybe I don't understand at all, but it seems to me like the timescaledb container is pretty standard/stock. The main part of the project is to explain the Configuration of HomeAssistant in order to take Advantage of it. And If I understand correctly, that is the purpose of this Issue Thread.
No offense, I'm just trying to understand :100: .
The purpose of this thread is to provide information about my timescaledb-addon and how to use it with home-assistant.
First, I wouldn't pull
These are the major differences between the image you mention and this addon-image:
But please choose as you like. It's open-source right?
@Expaso
That's quite clear, thanks :+1: .
So, if I understand correctly, your "addon" is not so much an "addon" (at least not for me if I run it as a Container), but it's rather a "stock" timescaledb-ha
(or timescaledb
?) with some helpers scripts and additional programs/toolkits installed.
Just to clarify: it's basically not something homeassistant-specific, but rather a "normal" (generic) timescaledb with extra packages. Correct ? Maybe I was confused by the name of the image after all ...
I am fine running your Container, I'd just like to understand better.
Right now you can argue I don't even have a backup strategy for other containers' postgresql
Databases (except ZFS snapshots that is).
So of course I am open to options ;).
I wonder then why postgres
and timescaledb-ha
do not include the toolkit and pgAgent by default ...
EDIT: alright, this is more where I should be looking for the Container Details: https://github.com/Expaso/hassos-addon-timescaledb/releases rather than this GitHub Repository itself.
Hahaha yes, that is indeed de place to be!
My addon IS a real homeassistant addon with all bells and whistles, but you can also run it standalone. Just because some people want to run it in plain docker or kubernetes or whatever. It's just a feature I have added in the lastest release because some people run homeassistant on a raspberry pi, but want to run this addon on more beefier hardware. It is a fully capable database server as a matter of fact.
I finally managed to get my migration script kinda working now ...
Some very tricky to diagnose DNS Name Resolution of Podman made me lose a few days :upside_down_face:. I might need to cleanup the repos quite a bit ...
https://github.com/luckylinux/migrate-sql-database#general
Now I need to decide which Database to use :laughing:.
Hi. I believe there are two small errors in your Realized Views: state_min is missing in the list of fields, resulting in an obscure error: ERROR: column "state_count" specified more than once
With state_min added, the error is gone.
CREATE MATERIALIZED VIEW ltss_summary_hourly( entity_id, bucket, state_avg, state_max, state_min, state_count)
WITH (timescaledb.continuous) AS
SELECT ltss.entity_id,
time_bucket('1h'::interval, ltss."time") AS bucket,
avg(ltss.state_numeric) AS state_avg,
max(ltss.state_numeric) AS state_max,
min(ltss.state_numeric) AS state_min,
count(ltss.state_numeric) AS state_count
FROM ltss
GROUP BY ltss.entity_id, (time_bucket('1h'::interval, ltss."time"));
CREATE MATERIALIZED VIEW ltss_summary_daily( entity_id, bucket, state_avg, state_max, state_min, state_count)
WITH (timescaledb.continuous) AS
SELECT ltss.entity_id,
time_bucket('1d'::interval, ltss."time") AS bucket,
avg(ltss.state_numeric) AS state_avg,
max(ltss.state_numeric) AS state_max,
min(ltss.state_numeric) AS state_min,
count(ltss.state_numeric) AS state_count
FROM ltss
GROUP BY ltss.entity_id, (time_bucket('1d'::interval, ltss."time"));
@42Network You are totally right! Thank you for your fix!! I have editied the original post!
Are there any instruction on cleaning the DB (with reconfig of recorder) to bring the size of the DB down ? I know my top states and already removed some from recorder config, but DB doens't shrink...
Hi @bezigebever,
Removing data is not reclaiming diskspace immediately. See also: https://www.postgresql.org/docs/16/routine-vacuuming.html#VACUUM-FOR-SPACE-RECOVERY
If you want want reclaim diskspace, run VACUUM FULL on your database.
@expaso thanks for your support, it's running now for 10 minutes, suppose it takes quite some time. Keep you informed. Do you have some best practices, eg on scheduling the vacuum or using compression ? DB's are quite new for me and think for a lot of people, so I'm willing to do a write-up afterwards if that can help. Really like the effort you put into this add-on !
Hello @expaso,
Just a quick question: I took too long to upgrade timescaledb addon, and when I did it straight from 1.1.6 to 4.2.0, it fails as it does not manage the upgrade of Postgresql from 12 to the latest used.
It managed to recover my previous version, but how can I upgrade Postgresql to the latest from there?
A bit dumb question, but I thank you if you can guide me on this.
Hi @luisfmrosa!
Thank you for reaching out! No worries, I will help you! :)
Could you please try these steps and report back to me? https://github.com/expaso/hassos-addon-timescaledb/issues/44#issuecomment-2294893495
Amazing, it worked!
Thank you very much for your help, much appreciated.
Luis
Em qua., 23 de out. de 2024, 14:49, Expaso @.***> escreveu:
Hi @luisfmrosa https://github.com/luisfmrosa!
Thank you for reaching out! No worries, I will help you! :)
Could you please try these steps and report back to me? expaso/hassos-addon-timescaledb#44 (comment) https://github.com/expaso/hassos-addon-timescaledb/issues/44#issuecomment-2294893495
β Reply to this email directly, view it on GitHub https://github.com/expaso/hassos-addons/issues/1#issuecomment-2432049832, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRR6BFFSV372JHWKDHE4CDZ46LPDAVCNFSM4OGXYMJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TENBTGIYDIOJYGMZA . You are receiving this because you were mentioned.Message ID: @.***>
Hi there, and thanks for this great add-on. After reading this thread, I'd like to add a few more questions.
After I instruct the recorder to send all it's data to Timescale (namely the DB named homeassistant with timescale_enabled) and configure the recorder like this:
recorder:
db_url: postgresql://postgres:homeassistant@77b2833f-timescaledb/homeassistant
auto_purge: true
purge_keep_days: 30
Do I then somehow have to run VACUUM FULL on my homeassistant database via pgAdmin4 manually every now and then or can we somehow schedule that in HA?
In the description here, there is a (for me) confusing sentence. Would you mind helping me understand:
For the Home Assistant recorder, you probaly need some SQL storage too. That means you also need to bring stuff like MariaDb or Postgres to the table (unless you keep using the SqlLite database).
So.. why not combine these? Seriously?! You ask...
By this, do you mean to say that with my above described recorder and having configured your add-on like this, I'm done?
databases:
- homeassistant
timescale_enabled:
- homeassistant
timescaledb:
telemetry: basic
maxmemory: 512MB
maxcpus: 2
max_connections: 30
system_packages: []
init_commands: []
retry_upgrade: false
I'm wondering because here you mention:
To bring your precious sensor-statistics into TimescaleDB, you can use the Long Time State Storage Component 797. Available on HACS.
Which makes me wonder, aren't they already in TimescaleDB? Or did you simply mean, that if we'd like to have long term storage, we could add another DB to the configuration of TimescaleDB (I showed above) with timescale_enabled?
The ltss integration states that it needs PostGIS and your add-on says it is integrated. Is there a step necessary or is this automatically handled by your add-on?
Hi @ovizii !!
Thanks for reaching out! I'll try to answer your questions one-by-one:
"Do I then somehow have to run VACUUM FULL on my homeassistant database via pgAdmin4 manually every now and then or can we somehow schedule that in HA?"
No, this is not needed. It won't hurt, but postgresql takes care of this automatically under normal conditions.
"By this, do you mean to say that with my above described recorder and having configured your add-on like this, I'm done?"
For the standard recorder yes. HA normally uses SQLLite but this is not needed anymore when you now have a full-blown postgresql server running. This gives you a single point of data storage, namely: The postgresql database in the TimeScaleDb addon.
"Which makes me wonder, aren't they already in TimescaleDB? Or did you simply mean, that if we'd like to have long term storage, we could add another DB to the configuration of TimescaleDB (I showed above) with timescale_enabled?"
Home assistant has it's default recorder component. This component automatically removes data older then 30 days in your setup. That is not really long-term. The LTSS component is basically a copy of the recorder component, that does not remove it's data at all. Never. (hence, long term storage). Both can point to the same timescale-database, whereby the normal recorder tables will hold detailed information for a maximum of 30 days (I have mine setup for 2 days btw), and the LTSS table (from the LTSS component) will hold detailed state information forever. This table will quickly grow, so setting up compression and/or continues aggregates is a good idea to keep this in check. See the timescale docs for more details.
"The ltss integration states that it needs PostGIS and your add-on says it is integrated. Is there a step necessary or is this automatically handled by your add-on?"
Automatically handled. Both Timescale and Postgis are just extensions of a postgresql database. Just point everything to the homeassistant database (default setup) and everything should just work, and you can start building queries in Grafana for instance on your LTSS table in the homeassistant database.
Because the recorder-component also points to this database, you van also query the states tables of the recorder component. Just be aware that the recordertables only hold data from the last 30 days in your setup.
Oh, and before I forget. This addon was created before Homeassistant itself also calculates some long-term statistics for itself. You can also query those from the homeassistant database, because they are managed by the default recorder component.
You basically have one-stop-shop for all your homeassistant data here, capable of providing you with all your data needs ;)
Would really appreciate some additional documentation on actually integrating it with home assistant. At its default installation state, it simply installs the server. Would love additional information on getting it fully working with Home Assistant (I guess LTSS is required too?) and replacing recorder if possible (does it currently go to both)? Some examples of actually using that data within HA would be great too.
Please dontt take these as complaints as I do definitely appreciate what was done here. Just need a bit more information on how use this special integration after being sold on how good it was. =)