PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
23.46k stars 1.03k forks source link

Slow schema cache loading and double caching schema #3704

Open MHC2000 opened 3 months ago

MHC2000 commented 3 months ago

Environment

As we use a lot of tables it looks like the initial loading and reloading of the schema cache takes more than 15 minutes. The following log was created after starting postgrest new. I see two config reloads, but there was none triggered at this time.

API is very slow at this loading time.

Only if the query to the API is without joins to other tables the query runs in normal time. If the query includes joins to other tables the first query takes several minutes to be executed.

In parallel I've queried the admin-port. Ready-Endpoint is on 503 still after Schema cache loaded is shown in the log file. And still after the second reload of the config and schema, it's on 503. Config and Live are directly on 200. Schema_Cache is round about 12 MB. So in total the system needs about 10 to 15 minutes to get the whole structure.

I know our structure is quite large and maybe it's too large for postgrest. But would be interesting why the config and schema seems to be loaded 2 times?

23/Aug/2024:18:57:36 +0200: Starting PostgREST 12.2.3...
23/Aug/2024:18:57:36 +0200: Admin server listening on port 3001
23/Aug/2024:18:57:36 +0200: Listening on port 3000
23/Aug/2024:18:57:36 +0200: Successfully connected to PostgreSQL 13.14 (Ubuntu 13.14-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
23/Aug/2024:18:57:36 +0200: Config reloaded
23/Aug/2024:18:57:40 +0200: Schema cache queried in 547.9 milliseconds
23/Aug/2024:18:57:40 +0200: Schema cache loaded 1286 Relations, 4921 Relationships, 468 Functions, 0 Domain Representations, 4 Media Type Handlers, 1222 Timezones
23/Aug/2024:19:03:31 +0200: Successfully connected to PostgreSQL 13.14 (Ubuntu 13.14-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
23/Aug/2024:19:03:31 +0200: Config reloaded
23/Aug/2024:19:03:35 +0200: Schema cache queried in 503.1 milliseconds
wolfgangwalther commented 3 months ago

I know our structure is quite large

Is there any chance you could share a schema-only dump privately? The interesting bit are obviously the 4921 relationships, and those are not that easy to write a script for mass-generation for.

MHC2000 commented 3 months ago

@wolfgangwalther I could remove the functions from the api schema and share only the views (in the api are only views, relations and functions no tables). But this views are representations of other tables in other schemas. So I'm not sure if that helps. Is there any way to send it privately over github or another way?

wolfgangwalther commented 3 months ago

Hm, the dump would only help with all schemas. The problem is most likely in returning too many unneeded objects from other schemas, before filtering them out in haskell. So we'd need all of the views and tables in all schemas. Functions could also create relationships, but most likely they could be left out. In any case it needs to be self-contained, so it can be run in a fresh database to create all objects.

You could send a link to download via email to info at postgrest dot org.

MHC2000 commented 3 months ago

OK understood. Will see what I can do. Not sure if I can provide that.

steve-chavez commented 3 months ago

@MHC2000 If possible you could try to mangle the tables and columns names. That would certainly help us to debug this faster.

MHC2000 commented 3 months ago

we are talking about a 3 digit number of tables, which are involved. And I guess 4 digits number of column names. not sure how I shall do that in a manageable manner. I'll get back to you on Monday

MHC2000 commented 2 months ago

sadly I'm not able to provide a structure dump in the near future. Will get back to you as soon as I can provide a large enough structure to provoke the behaviour

gustavorps commented 2 months ago

+1