balancednetwork / balanced-backend

Backend to balanced stats page and other services
Apache License 2.0
0 stars 1 forks source link

Move time-series data to TimescaleDB #71

Open CyrusVorwald opened 1 week ago

CyrusVorwald commented 1 week ago

The current implementation uses separate PostgreSQL tables for different time granularities (5m, 15m, 1h, etc.). TimescaleDB is a PostgreSQL extension to make SQL scalable for time-series data.

TimescaleDB is chosen over the TICK stack due to relative ease of code changes.

Instead of having separate tables for different time intervals, time-series data can be consolidated. For example:

CREATE TABLE pool_metrics (
    time TIMESTAMPTZ NOT NULL,
    pool_id INTEGER NOT NULL,
    chain_id INTEGER NOT NULL,
    price DOUBLE PRECISION NOT NULL,
    volume DOUBLE PRECISION NOT NULL,
    total_supply DOUBLE PRECISION,
    base_volume DOUBLE PRECISION,
    quote_volume DOUBLE PRECISION,
    base_lp_fees DOUBLE PRECISION,
    quote_lp_fees DOUBLE PRECISION,
    base_baln_fees DOUBLE PRECISION,
    quote_baln_fees DOUBLE PRECISION,
    block_height INTEGER,
    PRIMARY KEY(time, pool_id)
);

CREATE TABLE token_metrics (
    time TIMESTAMPTZ NOT NULL,
    address TEXT NOT NULL,
    chain_id INTEGER NOT NULL,
    price DOUBLE PRECISION NOT NULL,
    volume DOUBLE PRECISION,
    liquidity DOUBLE PRECISION,
    total_supply DOUBLE PRECISION,
    block_height INTEGER,
    PRIMARY KEY(time, address)
);

SELECT create_hypertable('pool_metrics', 'time');
SELECT create_hypertable('token_metrics', 'time');

CREATE INDEX ON pool_metrics (pool_id, time DESC);
CREATE INDEX ON token_metrics (address, time DESC);

Continuous materialized views automatically maintain pre-aggregated summaries of data. For example, pool_metrics can be grouped into 5-minute intervals with OHLCV data as follows:

CREATE MATERIALIZED VIEW pool_metrics_5m
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('5 minutes', time) AS bucket, -- Group data into 5-min buckets
    pool_id,
    chain_id,
    first(price, time) AS open,    -- First price in interval
    last(price, time) AS close,    -- Last price in interval
    max(price) AS high,            -- Highest price in interval
    min(price) AS low,             -- Lowest price in interval
    sum(volume) AS volume          -- Total volume in interval
FROM pool_metrics
GROUP BY bucket, pool_id, chain_id;

Raw data can be kept for 30 days and summaries can be kept for a year:

SELECT add_retention_policy('pool_metrics', INTERVAL '30 days');
SELECT add_retention_policy('pool_metrics_5m', INTERVAL '365 days');

The old existing tables would need to be migrated to the new ones. Application code such as models, CRUD operations, and cron jobs would need to be updated to reflect the new tables and materialized views. Monitoring may similarly need to be updated.

robcxyz commented 1 week ago

This would be a rather large upgrade but yes, totally agree with the suggestion for timescale but to do the summary stats through materialized views would take a rather large change to backend.