datamade / how-to

📚 Doing all sorts of things, the DataMade way
MIT License
86 stars 12 forks source link

Partioning large tables in PostgreSQL #102

Open fgregg opened 4 years ago

fgregg commented 4 years ago

Background

In the courts project we have some tables that are in the hundreds of millions of rows. Working with these data is pretty slow, and at this size, smart indexing doesn't seem to be getting us much benefit.

Partitioning tables might be a helpful strategy here. As long as we keep whole case sequences w/i the same partition, we might get benefits of disk access and smaller indices. I think what might make sense is to partition case number range.

I don't really know if this will be helpful for the types of queries we do, but I suppose that what R&D is for!

Proposal

For a realistic large table (like the court's docket events) try out some different partitioning strategies and see if they improve query performance on types of queries we run against it (sample queries can be taken from various analyses).

Deliverables

A comparison of query performance of the unpartitioned and partitioned versions of the table along with a writeup of the most successful partitioning strategy.

Timeline

Probably two days. One to get the data set up and the second to tweak and run various queries against it.