flexion / ef-cms

An Electronic Filing / Case Management System.
24 stars 10 forks source link

Automate TTL in Postgres #10508

Open pixiwyn opened 2 months ago

pixiwyn commented 2 months ago

In PostgreSQL, there is no built-in Time-to-Live (TTL) feature for automatically expiring or deleting rows after a certain amount of time like some NoSQL databases offer (e.g., MongoDB). However, you can implement TTL behavior in Amazon Aurora PostgreSQL (or any PostgreSQL database) using a combination of techniques, such as:

  1. Using cron Jobs or pg_cron Extension

    Amazon Aurora PostgreSQL supports the pg_cron extension, which lets you schedule periodic tasks to run within the database. You can use pg_cron to schedule a job that periodically deletes or archives rows that are older than a specific duration.

Example SQL query you could schedule:

sql

DELETE FROM my_table WHERE created_at < NOW() - INTERVAL '30 days';

This would delete records older than 30 days.
  1. Triggers with Expiry Logic

    Another option is to use a combination of a timestamp column (e.g., created_at or expires_at) and a trigger that periodically checks if rows should be deleted or archived. You could create a trigger that automatically checks row expiration at certain intervals or actions. However, this approach might be less efficient than using pg_cron for larger datasets.

  2. Partitioning

    If the TTL is tied to time (e.g., daily, weekly, or monthly data), you can use PostgreSQL partitioning to store data in separate partitions based on a timestamp field. Then, you can simply drop older partitions, which is much faster than deleting individual rows. Example:

    sql

    CREATE TABLE my_table ( id serial PRIMARY KEY, data text, created_at timestamp NOT NULL ) PARTITION BY RANGE (created_at);

    CREATE TABLE my_table_2023_09 PARTITION OF my_table FOR VALUES FROM ('2023-09-01') TO ('2023-10-01');

    To "expire" data, you can drop the partition after a certain time.

  3. Application-Level TTL

    You could implement TTL logic at the application level by querying the database for rows older than a certain duration and deleting or archiving them in your application's backend code.

Considerations:

Performance: Deleting large volumes of data can be slow. Using partitioning and dropping partitions instead of individual row deletion can significantly improve performance.
Vacuum: Frequent deletions may require you to run VACUUM more often to reclaim storage space and maintain performance.

In Amazon Aurora PostgreSQL, using pg_cron is probably the most straightforward and efficient solution for implementing TTL-based deletion without needing to manage complex infrastructure.