askap-vast / vast-pipeline

This repository holds the code of the Radio Transient detection pipeline for the VAST project.
https://vast-survey.org/vast-pipeline/
MIT License
8 stars 3 forks source link

Scaling up Postgres #413

Open srggrs opened 3 years ago

srggrs commented 3 years ago

At the moment queries seems a bit slow, mostly for datatables (I think it does a COUNT and then a SELECT ... FROM ...). I've been suggested (from ADASS conference networking) to look into these things:

Hi Segio, I'd try to figure out where the time is spent. Do you also get 2-5 seconds when you run the queries natively in psql? If yes, then it's the DB of course and unlikely the web app or network overhead. DB optimizations are many, and each gives you a few percent, but for us the largest jump in performance was when last summer we went from HDDs to SSDs. The gains are incredible, and we are now completely bounded by CPU+RAM throughput, not by I/O, (up to about 15 large simultaneous queries running). To speed up CPU-heavy things like QC3 queries further, we'd have to get faster RAM and faster CPUs, and/or come up with parallelized DB solutions (sharding, native parallelization in Postgres 13, etc). Final thought: if you can foresee that you will always be in the neighborhood of a few million rows, then putting the DB in a large RAM disk would make it much faster, too. It's volatile, true, but if you needed to restore it, it'd be fast to do. Hope this is helpful.

It came from this guy:

Adam, at Data Lab (https://datalab.noao.edu/) we use Postgres, with the Q3C extension (https://github.com/segasai/q3c) for spatial queries. You compute a Q3C index on a set of Ra+Dec columns once, and then spatial queries are blazing fast. We have some tables with billions of rows..

srggrs commented 3 years ago

One solution for the COUNT query might be found in https://www.citusdata.com/blog/2016/10/12/count-performance/

More general optimization on PostgreSQL can be found here https://wiki.postgresql.org/wiki/Performance_Optimization

srggrs commented 3 years ago

Also would be worth setting up a Postgres Cluster . . . (e.g. 2 or more containers) and increase the maximum number of connections