expaso / hassos-addons

Home Assistant Add-ons
Apache License 2.0
51 stars 14 forks source link

Pinned: Additional Documentation for Timescaledb Integration #1

Open MACscr opened 4 years ago

MACscr commented 4 years ago

Would really appreciate some additional documentation on actually integrating it with home assistant. At its default installation state, it simply installs the server. Would love additional information on getting it fully working with Home Assistant (I guess LTSS is required too?) and replacing recorder if possible (does it currently go to both)? Some examples of actually using that data within HA would be great too.

Please dontt take these as complaints as I do definitely appreciate what was done here. Just need a bit more information on how use this special integration after being sold on how good it was. =)

expaso commented 4 years ago

Hello MACscr!

Thank you for your suggestion!

Basic Setup

The TimescaleDB addon is basically a PostgreSql Database server with TimescaleDb extentions on top of it.

Depending on what you would like to achieve, you will indeed need extra components. It's a bit like mix and match clothes. Depending on how you want to look, you pick your shirt and shoes. This makes it a little difficult to write a tutorial about how you should dress yourself. Because it's for a good part personal taste.

What I can do is describe my own setup.

First, you need this addon. It will install a timescale-db enabled sql-database for you. Seconds, you need to feed this thing with data. Third, you need ways to retrieve that data and do cool stuff with it.

I imagine you already did step 1. And so, now you need some data flowing in..

What I did was: 1- Instruct the recorder to send all it's data to Timescale, so we gain performance in parts of home-assistant like the history browser, and other chars. For this, you simple need to point your recorder-connection-string to timescale-db. This is done by simply using this connection-string for the recorder: postgresql://homeassistant:yourpasswordhere@77b2833f-timescaledb/homeassistant

This will store all data of home-assistant into your timescaledb instance. image

Homeassistant will use thse tables, but you can also query that data yourself.

2- Use LTSS for long-term storage of statistics data. For this, please see the documentation of LTSS. You can use the exact same connectionstring as in the previous step. This will feed data about sensor-changes into the LTSS table:

image

Now you have data, you can use it to create graphics. For example about the charging state of my car. I use Grafana for this.

My dashboard looks like this: image

The query behind the the top-left chars for example:

image

Please use the documentation of Grafana about how to create charts, and you need some knowledge of SQL of course. Th extra column 'state_numeric' was a computed column created by me, to aid charts that work on numeric states. You can create this computed column using this script:

CREATE OR REPLACE FUNCTION public.isnumeric(
    text)
    RETURNS boolean
    LANGUAGE 'plpgsql'
    COST 100
    IMMUTABLE STRICT PARALLEL UNSAFE
AS $BODY$
DECLARE x NUMERIC;
BEGIN
    x = $1::NUMERIC;
    RETURN TRUE;
EXCEPTION WHEN others THEN
    RETURN FALSE;
END;
$BODY$;

ALTER FUNCTION public.isnumeric(text)
    OWNER TO postgres;

and then

ALTER TABLE public.ltss
    ADD COLUMN state_numeric double precision GENERATED ALWAYS AS (
CASE
    WHEN isnumeric((state)::text) THEN (state)::double precision
    ELSE NULL::double precision
END) STORED;

I coupled Grafana to Timescale with the following config:

image

Please note that I have created a special grafana user here, for security reasons. You have to create users yourself using the PostgreSQL documentation.

This is only one of many usescases for timescaledb, but it's a real-world example. It should give you a starting point, from which you can dig deeper into your data and create nice dashoards or alerting.

Does this at least partially answer your question?

Reducing the size of the database

To save storage space, you basicially have 3 options:

1) Compress the data. You have just proven that it even can beat Influx.

Pros:

Very easy Instant space savings. Cons:

Compressed data cannot be updated. Normally not a problem (because Influx can not update at all!), but should you have the need, you must first decompress the chuck to update, and after update re-compress again. Little bit of overhead (depending on the type of query, some are faster!) when querying compressed data. 2) Remove data you don’t need (like wiping the attributes column to NULL)

Pros:

Space saving Easy to do Cons:

Should you need that data in the future, bad luck. 3) Downsample your data, using Continuous Aggregates 6

Pros:

Saves a LOT of space. Will speed op queries over large timespans tremendously. Cons:

No more access to all measurements, but only the chosen aggregates values, like min/max/avg/mean within the chosen timeframe (like 5 minute/hour/day/month/etc). You have to overthink your data-needs upfront, because you basically are replacing a large set of old data with 1 record containing aggregated values. There is no way you can get your original data back if you deleted it to save space. Now, you could even combine all 3 methods in a mix and match style if you like, depending on your hunger for data. Maybe start with compression first, and create some continues aggregates to speed up queries spanning thousands of record. That way, you have a nice balance between performance, and not losing any data.

With that in place, you could start with dropping old chunks of data to save space if needed. Clearing the attributes column is more difficult with compression in place, but also

The good news: lots of options, freedom everywhere :slight_smile:

MACscr commented 4 years ago

Thanks so much for the detailed response! Ah, I really have no interest in using grafana at the moment (could easily change), but I do like the ability to be able to do queries from home assistant though and having https://www.home-assistant.io/integrations/sql/ does allow me to create some custom stuff. I really wish the awesome mini graph cards allowed us to supply exactly what we wanted it to graph versus relying on entity histories, etc. LTSS is really only useful for fetching from something like grafana or manual queries, right?

expaso commented 4 years ago

LTSS is almost the same as the recorder. In fact LTSS was created by looking at the recorder's code.

The recorder is used for ALL state state that has to be saved, and it's history, including events in the system, and switch states andsoforth. LTSS can be (but does not have to be ) used alongside the recorder to focus a little bit more on long-term sensor-trends like temperature charts or voltage measurements etc. It's created to me more in line with Timescale.

A simple 'get me all outside temperatures for today'-query can be satisfied by both. But.. the recorder cleans up after x days, LTSS won't (because you don't want it to delete your precious data).

" I really wish the awesome mini graph cards allowed us to supply exactly what we wanted it to graph versus relying on entity histories" -Me too! A simple SQL-Graph card.

"LTSS is really only useful for fetching from something like grafana or manual queries, right?" Correct. It's just a bridge between HA and the Database, nothing more. It will get your data in, but not out. You have query it out yourself.

yoheines commented 3 years ago

This is really useful, thank you! Consider adding it to the README.md.

bingo6 commented 3 years ago

@Expaso Thank you for the detailed description which has given me some food for thought. However, I still have a few questions to understand :)

What is the advantage of the additional numeric column? I see that this is needed to use "Format as Time Series" in Grafana. But I don't understand yet what is the advantage compared to "Format as Table"?

Doesn't the additional column also lead to more needed memory? (Are the numeric states stored twice? Once in state and once in state_nummeric).

And: Does your script transfer the past states or only the future state into the new column state_nummeric? (Sorry I am not so good in SQL yet).

expaso commented 3 years ago

Hi @bingo6 !

You're very welcome! I will try to answer your questions one by one:

"What is the advantage of the additional numeric column? I see that this is needed to use "Format as Time Series" in Grafana. But I don't understand yet what is the advantage compared to "Format as Table"?"

Home Assistant works with states. A state can be anything. A temperature, the day of the week, the location of your phone etc. Because of this, the data is stored in a text-column, so any value can be inserted. Now when you come to the point that you want to create charts for a specific sensor, say you want to plot the average temperature of your livingroom, you would first need to convert the state to a number, so you can do math on it (calculating the average). You could convert this column to a number on the fly in your query, but this comes with a cost in the form of CPU.

Because I'm running Timescale on a raspberry pi 4 with a SSD attached, I chose to convert the state to a numeric value upfront, during insert if the records using this calculated column. This will introduce a little bit of CPU overhead during insert, but will enable queries to be much much faster. Also, the queries themselves can be simpler because of the missing extra conversion needed.

_"Doesn't the additional column also lead to more needed memory? (Are the numeric states stored twice? Once in state and once in statenummeric)."

Yes. This is indeed true. The calculated column is an extra column which has to be stored, and so you need more storage space. Since I'm running of an SSD, I have plenty of space so no probs there. Storage is cheap, I rather have a snappy dashboard with near-instant performance. That said, the extra storage overhead is not large. It's only a few bytes per state extra. In fact, if you should choose to shorten the name of your entities by 1 character in home assistant, you are already making up for the extra space needed because of the savings in the attributes column πŸ‘ If you are really concerned about space, you could also choose to wipe the attributes column altogether (after insert, with a trigger for example, if you are not using it). It will save your more then 50% of the total size!

So, big gain, little loss.

_"And: Does your script transfer the past states or only the future state into the new column statenummeric? (Sorry I am not so good in SQL yet). "

All states, past and future. The generated column is computed from the 'state' field. If you would update or insert that field (old or new data), the state_numeric will follow that change immediately.

bingo6 commented 3 years ago

Thank you for the detailed answer. It convinced me completely, so I also added the column :)

Just by compressing the data, storage space is also no longer the problem and as you had already pointed out, there are other options to reduce.

I also think that adding the hints to the README.md would be useful.

jli113 commented 2 years ago

@Expaso Can you please point out where to change this timeframe? I tried to find it, but couldn't. "No more access to all measurements, but only the chosen aggregates values, like min/max/avg/mean within the chosen timeframe (like 5 minute/hour/day/month/etc)."

expaso commented 2 years ago

Hi @jli113 ,

The timeframes are chosen by you. It's how you setup your continues aggregates. I have created 2 myself:

ltss_hourly and ltss_daily.

With both I have enough data to balance detail vs performance.

For these in-depth usage questions, it's best that I refer to the excellent documentation of Timescale-db itself:

https://docs.timescale.com/timescaledb/latest/overview/core-concepts/continuous-aggregates/#quick-start

It's key to understand these principles to get the most out of timescale.

jli113 commented 2 years ago

Thanks~

Davincible commented 2 years ago

To actuallly get the TimescaleDB advantage do we have to manually convert the events, ltss, recorder_runs and states tables to hypertables ?

expaso commented 2 years ago

Hi @Davincible,

Home assistant tables like recorder_runs and states are just normal tables. I would suggest to leave them like that, because the HA recorder deletes data from those tables also.

That's why the LTSS integration is there: To store data for longer time.

The LTSS table therefore is created as a hypertable automatically by the LTSS integration, so you don't have to convert it manually.

edouardkleinhans commented 2 years ago

@Expaso , can you share your ltss_hourly and ltss_daily materialized view ?

expaso commented 2 years ago

@Expaso , can you share your ltss_hourly and ltss_daily materialized view ?

Sure! Please read also: https://docs.timescale.com/api/latest/continuous-aggregates/create_materialized_view/

CREATE MATERIALIZED VIEW ltss_summary_hourly( entity_id, bucket, state_avg, state_max, state_min, state_count)
WITH (timescaledb.continuous) AS
 SELECT ltss.entity_id,
    time_bucket('1h'::interval, ltss."time") AS bucket,
    avg(ltss.state_numeric) AS state_avg,
    max(ltss.state_numeric) AS state_max,
    min(ltss.state_numeric) AS state_min,
    count(ltss.state_numeric) AS state_count
   FROM ltss
  GROUP BY ltss.entity_id, (time_bucket('1h'::interval, ltss."time"));

and

 CREATE MATERIALIZED VIEW ltss_summary_daily( entity_id, bucket, state_avg, state_max, state_min, state_count)
WITH (timescaledb.continuous) AS
 SELECT ltss.entity_id,
    time_bucket('1d'::interval, ltss."time") AS bucket,
    avg(ltss.state_numeric) AS state_avg,
    max(ltss.state_numeric) AS state_max,
    min(ltss.state_numeric) AS state_min,
    count(ltss.state_numeric) AS state_count
   FROM ltss
  GROUP BY ltss.entity_id, (time_bucket('1d'::interval, ltss."time"));
wuast94 commented 1 year ago

would be nice to have an example dashboard to import 😊

luckylinux commented 7 months ago

@Expaso

I just discovered this very interesting project :+1:.

I was about to convert my SQLITE Database to PostgreSQL, but then I thought ... Maybe I should just convert it to TimescaleDB-HA instead.

Just to clarify though ...

With this integration, are there 2 Databases being used:

Or is it only one TimescaleDB-HA for everything ?

Thanks in advance for the Clarification :+1: .

expaso commented 7 months ago

Hi @luckylinux ,

Basically, this addon adds a PostgreSql server to your setup, including TimescaleDb extensions. Now, HA is using SQLLite by default, which is fine. It uses that database for the recorder component, and statistics. all the graphs and historical data you see in HA are coming from this database.

Now, the recorder component does not store your data indefinitely. It has a set horizon of a few days in most cases, in which old data is being discarded.

The first logical step is to move this recorder database over from SQLLite, to this addon's Postgresql instance. Running SQLLite AND Postgres is just a waste of resources. But.. moving it to this addon also opens op more possibilities to query around your data. It makes dashboarding and monitoring with 3rd party tools like Grafana or Prometheus easier.

Now, since one of the goals of this addon is to also provide a means of storing historical data, we need more then 3 days worth of data, right? That's were the second database comes in. This database is fed by the LTSS component. this component basically is a clone of the recorder component, but is TimescaleDb aware, and does not remove old data.

Using Timescale's hypertables, you can store vast amounts of data, without much pain. For instance, I have never had to delete ANY data since I created this component a few years ago, and query happily around. The database is like 170GB or so. This is WITH Timescale compression though, but that is transparent.

So having both databases there is just logical to do. They somewhat store overlapping data for a few days, but use it for different means. Your HA database is for operational HA things, and the Timescaledb database for statistical analysis.

luckylinux commented 7 months ago

@Expaso

Thank you for your reply.

But since it's a "addon" I guess it's not really applicable to my case, where I am running homeassistant myself under Podman (Docker). I'm NOT using Hassio ...

So basically you are telling me that I should create/have 2 databases servers/containers in the end:

On top of that I also have a homeassistant-server container of Course.

Does this sounds correct ?

expaso commented 7 months ago

@luckylinux , almost..

and ofcourse on top of that the homeassistant-server container.

luckylinux commented 7 months ago

@Expaso

Is there some difference by actually running the container standalone (e.g. docker pull husselhans/hassos-addon-timescaledb-amd64:latest) compared to what I'm doing (e.g. docker pull timescale/timescaledb-ha:pg16-all), besides the fact that I will have 2 containers running (which IMHO could be useful for maintenance) ?

I mean ... Maybe I don't understand at all, but it seems to me like the timescaledb container is pretty standard/stock. The main part of the project is to explain the Configuration of HomeAssistant in order to take Advantage of it. And If I understand correctly, that is the purpose of this Issue Thread.

No offense, I'm just trying to understand :100: .

expaso commented 7 months ago

The purpose of this thread is to provide information about my timescaledb-addon and how to use it with home-assistant.

First, I wouldn't pull , please use a version tag. This makes sure that you have a chance reading the release notes before I launch a new major version and possibly do breaking changes. I hate ruining your day ;)

These are the major differences between the image you mention and this addon-image:

But please choose as you like. It's open-source right?

luckylinux commented 7 months ago

@Expaso

That's quite clear, thanks :+1: .

So, if I understand correctly, your "addon" is not so much an "addon" (at least not for me if I run it as a Container), but it's rather a "stock" timescaledb-ha (or timescaledb ?) with some helpers scripts and additional programs/toolkits installed.

Just to clarify: it's basically not something homeassistant-specific, but rather a "normal" (generic) timescaledb with extra packages. Correct ? Maybe I was confused by the name of the image after all ...

I am fine running your Container, I'd just like to understand better.

Right now you can argue I don't even have a backup strategy for other containers' postgresql Databases (except ZFS snapshots that is).

So of course I am open to options ;).

I wonder then why postgres and timescaledb-ha do not include the toolkit and pgAgent by default ...

EDIT: alright, this is more where I should be looking for the Container Details: https://github.com/Expaso/hassos-addon-timescaledb/releases rather than this GitHub Repository itself.

expaso commented 7 months ago

Hahaha yes, that is indeed de place to be!

My addon IS a real homeassistant addon with all bells and whistles, but you can also run it standalone. Just because some people want to run it in plain docker or kubernetes or whatever. It's just a feature I have added in the lastest release because some people run homeassistant on a raspberry pi, but want to run this addon on more beefier hardware. It is a fully capable database server as a matter of fact.

luckylinux commented 7 months ago

I finally managed to get my migration script kinda working now ...

Some very tricky to diagnose DNS Name Resolution of Podman made me lose a few days :upside_down_face:. I might need to cleanup the repos quite a bit ...

https://github.com/luckylinux/migrate-sql-database#general

Now I need to decide which Database to use :laughing:.

42Network commented 6 months ago

Hi. I believe there are two small errors in your Realized Views: state_min is missing in the list of fields, resulting in an obscure error: ERROR: column "state_count" specified more than once

With state_min added, the error is gone.

CREATE MATERIALIZED VIEW ltss_summary_hourly( entity_id, bucket, state_avg, state_max, state_min, state_count)
WITH (timescaledb.continuous) AS
 SELECT ltss.entity_id,
    time_bucket('1h'::interval, ltss."time") AS bucket,
    avg(ltss.state_numeric) AS state_avg,
    max(ltss.state_numeric) AS state_max,
    min(ltss.state_numeric) AS state_min,
    count(ltss.state_numeric) AS state_count
   FROM ltss
  GROUP BY ltss.entity_id, (time_bucket('1h'::interval, ltss."time"));

CREATE MATERIALIZED VIEW ltss_summary_daily( entity_id, bucket, state_avg, state_max, state_min, state_count)
WITH (timescaledb.continuous) AS
 SELECT ltss.entity_id,
    time_bucket('1d'::interval, ltss."time") AS bucket,
    avg(ltss.state_numeric) AS state_avg,
    max(ltss.state_numeric) AS state_max,
    min(ltss.state_numeric) AS state_min,
    count(ltss.state_numeric) AS state_count
   FROM ltss
  GROUP BY ltss.entity_id, (time_bucket('1d'::interval, ltss."time"));
expaso commented 6 months ago

@42Network You are totally right! Thank you for your fix!! I have editied the original post!

bezigebever commented 3 months ago

Are there any instruction on cleaning the DB (with reconfig of recorder) to bring the size of the DB down ? I know my top states and already removed some from recorder config, but DB doens't shrink...

expaso commented 3 months ago

Hi @bezigebever,

Removing data is not reclaiming diskspace immediately. See also: https://www.postgresql.org/docs/16/routine-vacuuming.html#VACUUM-FOR-SPACE-RECOVERY

If you want want reclaim diskspace, run VACUUM FULL on your database.

bezigebever commented 3 months ago

@expaso thanks for your support, it's running now for 10 minutes, suppose it takes quite some time. Keep you informed. Do you have some best practices, eg on scheduling the vacuum or using compression ? DB's are quite new for me and think for a lot of people, so I'm willing to do a write-up afterwards if that can help. Really like the effort you put into this add-on !

luisfmrosa commented 1 month ago

Hello @expaso,

Just a quick question: I took too long to upgrade timescaledb addon, and when I did it straight from 1.1.6 to 4.2.0, it fails as it does not manage the upgrade of Postgresql from 12 to the latest used.

It managed to recover my previous version, but how can I upgrade Postgresql to the latest from there?

A bit dumb question, but I thank you if you can guide me on this.

expaso commented 1 month ago

Hi @luisfmrosa!

Thank you for reaching out! No worries, I will help you! :)

Could you please try these steps and report back to me? https://github.com/expaso/hassos-addon-timescaledb/issues/44#issuecomment-2294893495

luisfmrosa commented 1 month ago

Amazing, it worked!

Thank you very much for your help, much appreciated.

Luis

Em qua., 23 de out. de 2024, 14:49, Expaso @.***> escreveu:

Hi @luisfmrosa https://github.com/luisfmrosa!

Thank you for reaching out! No worries, I will help you! :)

Could you please try these steps and report back to me? expaso/hassos-addon-timescaledb#44 (comment) https://github.com/expaso/hassos-addon-timescaledb/issues/44#issuecomment-2294893495

β€” Reply to this email directly, view it on GitHub https://github.com/expaso/hassos-addons/issues/1#issuecomment-2432049832, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRR6BFFSV372JHWKDHE4CDZ46LPDAVCNFSM4OGXYMJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TENBTGIYDIOJYGMZA . You are receiving this because you were mentioned.Message ID: @.***>

ovizii commented 6 days ago

Hi there, and thanks for this great add-on. After reading this thread, I'd like to add a few more questions.

After I instruct the recorder to send all it's data to Timescale (namely the DB named homeassistant with timescale_enabled) and configure the recorder like this:

recorder:
  db_url: postgresql://postgres:homeassistant@77b2833f-timescaledb/homeassistant
  auto_purge: true
  purge_keep_days: 30

Do I then somehow have to run VACUUM FULL on my homeassistant database via pgAdmin4 manually every now and then or can we somehow schedule that in HA?

In the description here, there is a (for me) confusing sentence. Would you mind helping me understand:

For the Home Assistant recorder, you probaly need some SQL storage too. That means you also need to bring stuff like MariaDb or Postgres to the table (unless you keep using the SqlLite database).

So.. why not combine these? Seriously?! You ask...

By this, do you mean to say that with my above described recorder and having configured your add-on like this, I'm done?

databases:
  - homeassistant
timescale_enabled:
  - homeassistant
timescaledb:
  telemetry: basic
  maxmemory: 512MB
  maxcpus: 2
max_connections: 30
system_packages: []
init_commands: []
retry_upgrade: false

I'm wondering because here you mention:

To bring your precious sensor-statistics into TimescaleDB, you can use the Long Time State Storage Component 797. Available on HACS.

Which makes me wonder, aren't they already in TimescaleDB? Or did you simply mean, that if we'd like to have long term storage, we could add another DB to the configuration of TimescaleDB (I showed above) with timescale_enabled?

The ltss integration states that it needs PostGIS and your add-on says it is integrated. Is there a step necessary or is this automatically handled by your add-on?

expaso commented 5 days ago

Hi @ovizii !!

Thanks for reaching out! I'll try to answer your questions one-by-one:

"Do I then somehow have to run VACUUM FULL on my homeassistant database via pgAdmin4 manually every now and then or can we somehow schedule that in HA?"

No, this is not needed. It won't hurt, but postgresql takes care of this automatically under normal conditions.

"By this, do you mean to say that with my above described recorder and having configured your add-on like this, I'm done?"

For the standard recorder yes. HA normally uses SQLLite but this is not needed anymore when you now have a full-blown postgresql server running. This gives you a single point of data storage, namely: The postgresql database in the TimeScaleDb addon.

"Which makes me wonder, aren't they already in TimescaleDB? Or did you simply mean, that if we'd like to have long term storage, we could add another DB to the configuration of TimescaleDB (I showed above) with timescale_enabled?"

Home assistant has it's default recorder component. This component automatically removes data older then 30 days in your setup. That is not really long-term. The LTSS component is basically a copy of the recorder component, that does not remove it's data at all. Never. (hence, long term storage). Both can point to the same timescale-database, whereby the normal recorder tables will hold detailed information for a maximum of 30 days (I have mine setup for 2 days btw), and the LTSS table (from the LTSS component) will hold detailed state information forever. This table will quickly grow, so setting up compression and/or continues aggregates is a good idea to keep this in check. See the timescale docs for more details.

"The ltss integration states that it needs PostGIS and your add-on says it is integrated. Is there a step necessary or is this automatically handled by your add-on?"

Automatically handled. Both Timescale and Postgis are just extensions of a postgresql database. Just point everything to the homeassistant database (default setup) and everything should just work, and you can start building queries in Grafana for instance on your LTSS table in the homeassistant database.

Because the recorder-component also points to this database, you van also query the states tables of the recorder component. Just be aware that the recordertables only hold data from the last 30 days in your setup.

Oh, and before I forget. This addon was created before Homeassistant itself also calculates some long-term statistics for itself. You can also query those from the homeassistant database, because they are managed by the default recorder component.

You basically have one-stop-shop for all your homeassistant data here, capable of providing you with all your data needs ;)