freol35241 / ltss

Long time state storage (LTSS) custom component for Home Assistant using Timescale DB
MIT License
77 stars 21 forks source link

Reduce LTSS database size #119

Open ErikApption opened 1 year ago

ErikApption commented 1 year ago

I am not an expert with LTSS and looking through the PRs and issues - it looks like there have been many attemps but I am wondering if there is any easy and practical solution to reduce the size of the LTSS database. My database is 17Gb and this doesn't feel very efficient. I did experiment in the past with other timeseries databases and out of the box the storage was more optimized.

Any recommendations to compact the size of the ltss table?

rikroe commented 1 year ago

Currently, LTSS does not use TimescaleDB's compression feature (c.f. https://docs.timescale.com/use-timescale/latest/compression/about-compression/).

There is #44 which enables compression but also splits out the attributes to a separate table.

I just enable compression to my LTSS hypertable today, you should be able to do that without issue (backup first!):

alter table "ltss" set (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'entity_id'
);
select add_compression_policy('ltss', interval '7 days');

This will compress data after 7 days. Be careful as with compressed chunks you cannot delete anymore - more information in the TimescaleDB docs.

For me, the size of the hypertable was reduced from ~4GB to 325MB.

However, be sure that you are on the latest LTSS: https://github.com/freol35241/ltss/issues/100

sithmein commented 11 months ago

I played a bit today and found a way with which you can save some space. Currently the entity_id is text and there is only a limited set of entity IDs (around 700 in my case but most rows come from only a few entities). Therefore I extracted the entities into a separate table and only store references to the entities in the main table. With some basic Postgres magic this can even be done without changing any code in HA.

  1. Create the new entities table:
    CREATE TABLE entities (
    id SERIAL NOT NULL UNIQUE, -- this also creates an index
    entity_id TEXT NOT NULL PRIMARY KEY
    );
  2. Change the entity_id column from TEXT to INTEGER. This is a high-level description since it's not straight-forward on an existing table. I create a completely new table and copied data over after the following steps have been performed.
  3. Rename the main table from ltss to ltss_data (for example).
  4. Create a view that combines the main table with the entities table:
    CREATE OR REPLACE VIEW ltss AS
     SELECT l."time", e.entity_id, l.state, l.attributes, l.location, l.bucket_size FROM ltss_data l INNER JOIN entities e ON l.entity_id = e.id;
  5. Create a trigger function that converts the string entity_id into a numeric entity_id using the value from the new entities table:

    CREATE OR REPLACE FUNCTION normalize_entity_id_trigger_function() RETURNS TRIGGER AS $$
    DECLARE
    entity_id INTEGER;
    BEGIN
    SELECT id INTO entity_id FROM entities e WHERE e.entity_id = NEW.entity_id;
    IF entity_id IS NULL THEN
        INSERT INTO entities (entity_id) VALUES (NEW.entity_id);
        SELECT id INTO entity_id FROM entities e WHERE e.entity_id = NEW.entity_id;
    END IF;
    
    INSERT INTO ltss_data ("time", entity_id, state, attributes, location) VALUES
        (NEW.time, entity_id, NEW.state, NEW.attributes, NEW.location);
    RETURN NEW;
    END;
    $$ LANGUAGE plpgsql;
  6. Create a trigger on the ltss view which redirects inserts to the data table and replaces the text ID with the numeric ID:
    CREATE TRIGGER normalize_entity_id_trigger INSTEAD OF INSERT ON ltss FOR EACH ROW EXECUTE PROCEDURE  normalize_entity_id_trigger_function();

The HA code still inserts into ltss but data gets normalized automatically. In my case this saved about 20% of size (data plus indexes). The only downside is that when querying on the view and filter by entity, the index cannot be used since Postgres isn't smart enough the push the filter down due to the JOIN. Therefore you have to write an explicit query yourself that first filters on the entity_id and the does the join with the entities table.

Kirbo commented 3 months ago
SELECT pg_size_pretty( hypertable_size('public.ltss'));

image

I'd be very much interested in this feature as well. I've been running this Home Assistant + TimescaleDB + LTSS ever since 1st of January 2023 and my ltss table is currently 102 GB 😅 And yes, I am already excluding most of the irrelevant entities (or to be more precise: I'm only including the ones I actually care about), so any kind of way to reduce the size would be more than welcome!

Initially I was thinking of reducing the commit interval for ltss (https://github.com/freol35241/ltss/issues/201), as I don't need the ltss to store the values more than once per 1-15 minutes or so, but currently I think the commit interval is identical with the recorder, which currently commits once per 2 seconds, but any kind of compression or something would be more than welcome as well!