Open maudetes opened 1 week ago
We have a database with a lot of tables (+60K) that keep on increasing. [..] All the tables being independent, we actually don't need features such as Resource Embeddings.
That makes it possible to create a simple reproducer:
I put this into the test fixtures to create 100k dummy tables:
SELECT format($$
CREATE TABLE many_tables_%s (
id int,
a text
);
$$, generate_series(1,100000)) \gexec
Then I ran postgrest-with-postgresql-16 -f test/spec/fixtures/load.sql postgrest-run
. Starting PG takes a while to create those tables... but then:
27/Sep/2024:19:31:15 +0200: Starting PostgREST 12.3 (pre-release)...
27/Sep/2024:19:31:15 +0200: Listening on 0.0.0.0:3000
27/Sep/2024:19:31:15 +0200: Successfully connected to PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
27/Sep/2024:19:31:15 +0200: Listening for notifications on the "pgrst" channel
27/Sep/2024:19:31:15 +0200: Config reloaded
27/Sep/2024:19:31:19 +0200: Schema cache queried in 1216.4 milliseconds
27/Sep/2024:19:31:19 +0200: Schema cache loaded 100272 Relations, 227 Relationships, 143 Functions, 15 Domain Representations, 45 Media Type Handlers, 1194 Timezones
Result 1: 100k tables with two columns each take me 1,2s to query and about 4 seconds to load overall.
Ok, I guess I need more columns then?
After adding 51 more text columns, so a total of 53 columns:
27/Sep/2024:19:36:16 +0200: Starting PostgREST 12.3 (pre-release)...
27/Sep/2024:19:36:16 +0200: Listening on 0.0.0.0:3000
27/Sep/2024:19:36:16 +0200: Listening for notifications on the "pgrst" channel
27/Sep/2024:19:36:16 +0200: Successfully connected to PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
27/Sep/2024:19:36:16 +0200: Config reloaded
27/Sep/2024:19:37:20 +0200: Schema cache queried in 12430.7 milliseconds
27/Sep/2024:19:37:20 +0200: Schema cache loaded 100272 Relations, 227 Relationships, 143 Functions, 15 Domain Representations, 45 Media Type Handlers, 1194 Timezones
Result 2: 100k tables with 53 columns take 12s to query and 1:04 minutes to load overall.
Comparing that to your output, the first time the schema cache loaded it took:
09/Sep/2024:13:31:05 +0000: Config reloaded
09/Sep/2024:13:38:27 +0000: Schema cache queried in 5685.3 milliseconds
That's 5 seconds to query and a whopping 7+ minutes to load overall.
So clearly, I need something else in my schema than just columns.
Cutting the columns to half and adding a single primary key:
27/Sep/2024:19:44:25 +0200: Config reloaded
27/Sep/2024:19:45:01 +0200: Schema cache queried in 7094.1 milliseconds
Result 3: 100k tables with 27 columns and a PK take 7 seconds to query and 36 seconds to load overall.
There must be something else about those tables in the OP that I can't see immediately.
Environment
Description of issue
We're creating this issue to keep it separated from https://github.com/PostgREST/postgrest/issues/3704, but it is similar. Let us know if you would prefer for us to comment on it instead of this issue.
We have a database with a lot of tables (+60K) that keep on increasing. These tables are independent and added by users.
At deployment time, we have a downtime with the error
Could not query the database for the schema cache. Retrying.
for as long as the schema cache is not loaded. Startup logs (see alsoSchema cache queried
showing twice):PostgREST also consumes a lot of RAM, probably due to loading the cache schema from what we've understood. To make sure we don't crash, we've requested 5G.
We have also increased some configs on Postgres side since it seemed to us that they were limiting schema cache loading:
Outside of the schema cache loading at startup, we don't experience any performance issue. We would be interested on having your opinion on using PostgREST with this many tables and ways to mitigate the schema cache loading. All the tables being independent, we actually don't need features such as Resource Embeddings.