PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
23.27k stars 1.02k forks source link

Slow PostgREST startup with many tables #3733

Open maudetes opened 1 week ago

maudetes commented 1 week ago

Environment

Description of issue

We're creating this issue to keep it separated from https://github.com/PostgREST/postgrest/issues/3704, but it is similar. Let us know if you would prefer for us to comment on it instead of this issue.

We have a database with a lot of tables (+60K) that keep on increasing. These tables are independent and added by users.

At deployment time, we have a downtime with the error Could not query the database for the schema cache. Retrying. for as long as the schema cache is not loaded. Startup logs (see also Schema cache queried showing twice):

09/Sep/2024:13:31:05 +0000: Successfully connected to PostgreSQL 15.7 (Debian 15.7-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
09/Sep/2024:13:31:05 +0000: Config reloaded
09/Sep/2024:13:38:27 +0000: Schema cache queried in 5685.3 milliseconds
09/Sep/2024:13:38:27 +0000: Schema cache loaded 68320 Relations, 0 Relationships, 0 Functions, 0 Domain Representations, 4 Media Type Handlers, 1192 Timezones
09/Sep/2024:13:38:27 +0000: Successfully connected to PostgreSQL 15.7 (Debian 15.7-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
09/Sep/2024:13:38:27 +0000: Config reloaded
09/Sep/2024:13:40:01 +0000: Schema cache queried in 5792.6 milliseconds
09/Sep/2024:13:40:01 +0000: Schema cache loaded 68320 Relations, 0 Relationships, 0 Functions, 0 Domain Representations, 4 Media Type Handlers, 1192 Timezones

PostgREST also consumes a lot of RAM, probably due to loading the cache schema from what we've understood. To make sure we don't crash, we've requested 5G.

We have also increased some configs on Postgres side since it seemed to us that they were limiting schema cache loading:

shared_buffers = 2GB
max_locks_per_transaction = 10000

Outside of the schema cache loading at startup, we don't experience any performance issue. We would be interested on having your opinion on using PostgREST with this many tables and ways to mitigate the schema cache loading. All the tables being independent, we actually don't need features such as Resource Embeddings.

wolfgangwalther commented 3 days ago

We have a database with a lot of tables (+60K) that keep on increasing. [..] All the tables being independent, we actually don't need features such as Resource Embeddings.

That makes it possible to create a simple reproducer:

I put this into the test fixtures to create 100k dummy tables:

SELECT format($$
  CREATE TABLE many_tables_%s (
    id int,
    a text
);
$$, generate_series(1,100000)) \gexec

Then I ran postgrest-with-postgresql-16 -f test/spec/fixtures/load.sql postgrest-run. Starting PG takes a while to create those tables... but then:

27/Sep/2024:19:31:15 +0200: Starting PostgREST 12.3 (pre-release)...
27/Sep/2024:19:31:15 +0200: Listening on 0.0.0.0:3000
27/Sep/2024:19:31:15 +0200: Successfully connected to PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
27/Sep/2024:19:31:15 +0200: Listening for notifications on the "pgrst" channel
27/Sep/2024:19:31:15 +0200: Config reloaded
27/Sep/2024:19:31:19 +0200: Schema cache queried in 1216.4 milliseconds
27/Sep/2024:19:31:19 +0200: Schema cache loaded 100272 Relations, 227 Relationships, 143 Functions, 15 Domain Representations, 45 Media Type Handlers, 1194 Timezones

Result 1: 100k tables with two columns each take me 1,2s to query and about 4 seconds to load overall.

Ok, I guess I need more columns then?

After adding 51 more text columns, so a total of 53 columns:

27/Sep/2024:19:36:16 +0200: Starting PostgREST 12.3 (pre-release)...
27/Sep/2024:19:36:16 +0200: Listening on 0.0.0.0:3000
27/Sep/2024:19:36:16 +0200: Listening for notifications on the "pgrst" channel
27/Sep/2024:19:36:16 +0200: Successfully connected to PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
27/Sep/2024:19:36:16 +0200: Config reloaded
27/Sep/2024:19:37:20 +0200: Schema cache queried in 12430.7 milliseconds
27/Sep/2024:19:37:20 +0200: Schema cache loaded 100272 Relations, 227 Relationships, 143 Functions, 15 Domain Representations, 45 Media Type Handlers, 1194 Timezones

Result 2: 100k tables with 53 columns take 12s to query and 1:04 minutes to load overall.

Comparing that to your output, the first time the schema cache loaded it took:

09/Sep/2024:13:31:05 +0000: Config reloaded
09/Sep/2024:13:38:27 +0000: Schema cache queried in 5685.3 milliseconds

That's 5 seconds to query and a whopping 7+ minutes to load overall.

So clearly, I need something else in my schema than just columns.

Cutting the columns to half and adding a single primary key:

27/Sep/2024:19:44:25 +0200: Config reloaded
27/Sep/2024:19:45:01 +0200: Schema cache queried in 7094.1 milliseconds

Result 3: 100k tables with 27 columns and a PK take 7 seconds to query and 36 seconds to load overall.

There must be something else about those tables in the OP that I can't see immediately.