PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
23.48k stars 1.03k forks source link

Slow PostgREST startup with many tables #3733

Open maudetes opened 2 months ago

maudetes commented 2 months ago

Environment

Description of issue

We're creating this issue to keep it separated from https://github.com/PostgREST/postgrest/issues/3704, but it is similar. Let us know if you would prefer for us to comment on it instead of this issue.

We have a database with a lot of tables (+60K) that keep on increasing. These tables are independent and added by users.

At deployment time, we have a downtime with the error Could not query the database for the schema cache. Retrying. for as long as the schema cache is not loaded. Startup logs (see also Schema cache queried showing twice):

09/Sep/2024:13:31:05 +0000: Successfully connected to PostgreSQL 15.7 (Debian 15.7-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
09/Sep/2024:13:31:05 +0000: Config reloaded
09/Sep/2024:13:38:27 +0000: Schema cache queried in 5685.3 milliseconds
09/Sep/2024:13:38:27 +0000: Schema cache loaded 68320 Relations, 0 Relationships, 0 Functions, 0 Domain Representations, 4 Media Type Handlers, 1192 Timezones
09/Sep/2024:13:38:27 +0000: Successfully connected to PostgreSQL 15.7 (Debian 15.7-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
09/Sep/2024:13:38:27 +0000: Config reloaded
09/Sep/2024:13:40:01 +0000: Schema cache queried in 5792.6 milliseconds
09/Sep/2024:13:40:01 +0000: Schema cache loaded 68320 Relations, 0 Relationships, 0 Functions, 0 Domain Representations, 4 Media Type Handlers, 1192 Timezones

PostgREST also consumes a lot of RAM, probably due to loading the cache schema from what we've understood. To make sure we don't crash, we've requested 5G.

We have also increased some configs on Postgres side since it seemed to us that they were limiting schema cache loading:

shared_buffers = 2GB
max_locks_per_transaction = 10000

Outside of the schema cache loading at startup, we don't experience any performance issue. We would be interested on having your opinion on using PostgREST with this many tables and ways to mitigate the schema cache loading. All the tables being independent, we actually don't need features such as Resource Embeddings.

wolfgangwalther commented 1 month ago

We have a database with a lot of tables (+60K) that keep on increasing. [..] All the tables being independent, we actually don't need features such as Resource Embeddings.

That makes it possible to create a simple reproducer:

I put this into the test fixtures to create 100k dummy tables:

SELECT format($$
  CREATE TABLE many_tables_%s (
    id int,
    a text
);
$$, generate_series(1,100000)) \gexec

Then I ran postgrest-with-postgresql-16 -f test/spec/fixtures/load.sql postgrest-run. Starting PG takes a while to create those tables... but then:

27/Sep/2024:19:31:15 +0200: Starting PostgREST 12.3 (pre-release)...
27/Sep/2024:19:31:15 +0200: Listening on 0.0.0.0:3000
27/Sep/2024:19:31:15 +0200: Successfully connected to PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
27/Sep/2024:19:31:15 +0200: Listening for notifications on the "pgrst" channel
27/Sep/2024:19:31:15 +0200: Config reloaded
27/Sep/2024:19:31:19 +0200: Schema cache queried in 1216.4 milliseconds
27/Sep/2024:19:31:19 +0200: Schema cache loaded 100272 Relations, 227 Relationships, 143 Functions, 15 Domain Representations, 45 Media Type Handlers, 1194 Timezones

Result 1: 100k tables with two columns each take me 1,2s to query and about 4 seconds to load overall.

Ok, I guess I need more columns then?

After adding 51 more text columns, so a total of 53 columns:

27/Sep/2024:19:36:16 +0200: Starting PostgREST 12.3 (pre-release)...
27/Sep/2024:19:36:16 +0200: Listening on 0.0.0.0:3000
27/Sep/2024:19:36:16 +0200: Listening for notifications on the "pgrst" channel
27/Sep/2024:19:36:16 +0200: Successfully connected to PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
27/Sep/2024:19:36:16 +0200: Config reloaded
27/Sep/2024:19:37:20 +0200: Schema cache queried in 12430.7 milliseconds
27/Sep/2024:19:37:20 +0200: Schema cache loaded 100272 Relations, 227 Relationships, 143 Functions, 15 Domain Representations, 45 Media Type Handlers, 1194 Timezones

Result 2: 100k tables with 53 columns take 12s to query and 1:04 minutes to load overall.

Comparing that to your output, the first time the schema cache loaded it took:

09/Sep/2024:13:31:05 +0000: Config reloaded
09/Sep/2024:13:38:27 +0000: Schema cache queried in 5685.3 milliseconds

That's 5 seconds to query and a whopping 7+ minutes to load overall.

So clearly, I need something else in my schema than just columns.

Cutting the columns to half and adding a single primary key:

27/Sep/2024:19:44:25 +0200: Config reloaded
27/Sep/2024:19:45:01 +0200: Schema cache queried in 7094.1 milliseconds

Result 3: 100k tables with 27 columns and a PK take 7 seconds to query and 36 seconds to load overall.

There must be something else about those tables in the OP that I can't see immediately.

steve-chavez commented 1 month ago

@maudetes Perhaps you have a bloated pg_catalog (many dead tuples)? That can make the schema cache load really slow, see https://github.com/PostgREST/postgrest/issues/3212.

Try running this query to find out.

wolfgangwalther commented 1 month ago

@maudetes Perhaps you have a bloated pg_catalog (many dead tuples)? That can make the schema cache load really slow, see #3212.

The numbers in the OP and my first experiments show that the query itself is fast. It's the Haskell side that's slow.

All the tables being independent, we actually don't need features such as Resource Embeddings.

Are you sure there are no foreign keys on any of those tables, maybe pointing at some other hidden tables, so that they don't show up as relationships?

Are there any views in this schema or are all those exposed endpoints coming from tables?

How much stuff is in the schema mentioned in db-extra-search-path ? By default it's the public schema.

maudetes commented 1 month ago

Hello ! Thank you a lot for your replies and pointers! And sorry for the delay to give you feedback!

We were actually migrating our Postgres to a server with more resources (with an SSD disk and more RAM). I think our Postgres was being slowed by other factors. We were suspecting other heavy queries on other databases, but it could also have been the bloated pg_catalog you mentioned. Indeed, applying a pg_dump --schema-only my-db -f /tmp/pg_dump_schema.sql at that time would hang for quite a long time. I can't give you special measures or logs because we don't reproduce the situation anymore.

Since the server migration (and without having modified our PostgREST conf), we haven't encountered slow startup. See the latest logs:

04/Oct/2024:08:15:39 +0000: Attempting to connect to the database...
04/Oct/2024:08:15:39 +0000: Listening on port 3000
04/Oct/2024:08:15:39 +0000: Connection successful
04/Oct/2024:08:15:39 +0000: Listening for notifications on the pgrst channel
04/Oct/2024:08:15:39 +0000: Config reloaded
04/Oct/2024:08:16:05 +0000: Schema cache loaded
04/Oct/2024:08:16:05 +0000: Attempting to connect to the database...
04/Oct/2024:08:16:05 +0000: Connection successful
04/Oct/2024:08:16:05 +0000: Config reloaded
04/Oct/2024:08:16:31 +0000: Schema cache loaded