ankane / pgslice

Postgres partitioning as easy as pie
MIT License
1.11k stars 67 forks source link

Rails: How to perform SQL Migrations without excessive diff #46

Closed andyatkinson closed 1 year ago

andyatkinson commented 1 year ago

While PgSlice is a Ruby gem not specific to Rails, it has some Rails info and I’m personally using it only with Rails and Migrations. https://github.com/ankane/pgslice#rails

I've created a partitioned table and partitions, using the --dry-run functionality of PgSlice to print the SQL statements. I’ve pasted the SQL statements into new Migration files.

I'm using the execute() method within Rails migrations supplying SQL as heredoc text. When I migrate, the dumped schema in db/structure.sql includes the expected partitioned tables and partitions, but unfortunately also touches everything else causing a large diff of unrelated schema objects that are not actually changing. The diff is so large it's not viewable on github either, although the diff can be viewed locally with git.

What I’m trying to do is have the changes to db/structure.sql only reflect what’s changed. I don't see any useful options to the Schema Dumper in Rails for the SQL format.

I'm wondering if PgSlice has any recommendations on how to introduce new partitions in Rails Migrations, since Rails does not support PostgreSQL partitions with create_table to my knowledge, in a way that limits the dumps the schema in a way that doesn’t cause an excessive diff.

Thanks.

andyatkinson commented 1 year ago

@jnunemaker suggests https://www.johnnunemaker.com/rails-postgres-native-partitioning/ for local dev:

ankane commented 1 year ago

Hey @andyatkinson, my recommendation would be to only partition in production/staging and keep it outside of Rails (breaking dev/prod parity). If you do want to keep it in Rails, you'll probably need to switch to db/structure.sql (which will have a one-time large diff) or create a gem that can export partitioned tables (but in either case, keeping partitions up-to-date will be a chore).

andyatkinson commented 1 year ago

Hi @ankane. We do use the SQL dump format.

my recommendation would be to only partition in production/staging

Ok, I was hoping to bring prod/local together, but I'm running into these problems.

In the linked post from @jnunemaker, he wrote code to manage partitions in prod that could be invoked from cron, and also callable in local. That's nice because it makes the solution testable locally.

I'm currently considering splitting the difference, and having a partitioned table and single default partition in local. pgslice doesn't currently create DEFAULT partitions, so I just made the SQL for that now. If that works out I'll write it up and share.

We may also pursue possible future partition management locally as well, but it's not needed by the data itself, which is seed data that's rebuilt, and would only be in the "current" partition (in for example, a monthly partition strategy).

I guess the second choice would be to punt on partitions entirely in local.

Thanks.