Easier way to run specific schema queries

steve-chavez commented 9 months ago

Problem

Currently it's hard to run a single query of the schema cache. This is useful for debugging (to know if a query is taking long) and for developing a feature.

Right now we only support running all the schema cache queries through:

$ postgrest --dump-schema

# or with the Nix tooling
$ nix-shell
$ postgrest-with-postgresql-16 postgrest-run --dump-schema

Proposal

Add a argument to the CLI for running a single query.

$ postgrest --dump-schema --query=allComputedRels

Which would only run

https://github.com/PostgREST/postgrest/blob/71887e4b78aca74adcdabfc61c2901e7e479878f/src/PostgREST/SchemaCache.hs#L873-L916

Solution

Taking the feedback from Wolfgang, it should be:

$ postgrest --dump-schema=full
$ postgrest --dump-schema=relationships
...

wolfgangwalther commented 9 months ago

How about changing --dump-schema from a flag to an option instead? And then do --dump-schema=all or --dump-schema=full for the current full dump.

The option given should also not relate to the internal query, but better to the output. Currently we return this:

{"dbMediaHandlers":[],"dbRelationships":[],"dbRepresentations":[],"dbRoutines":[],"dbTables":[],"dbTimezones":[]}

I suggest we remove those db prefixes (not really helpful information), and then just do --dump-schema=relationships etc. - matching the top-level json keys.

steve-chavez commented 9 months ago

Hm, even with this the underlying query is still hidden unless looking at the pg logs.

Since our schema cache queries are just generated text, maybe we could use PLHaskell for this. Just have a postgrest.schema_cache('relationships') function that users can run from db and inspect.

@wolfgangwalther WDYT?

wolfgangwalther commented 9 months ago

Since our schema cache queries are just generated text, maybe we could use PLHaskell for this. Just have a postgrest.schema_cache('relationships') function that users can run from db and inspect.

Can you elaborate on this? I don't understand the idea here.

We need:

We need a way to better run and inspect the schema cache queries.
Imho, we also need a way for https://github.com/PostgREST/postgrest-openapi to get the results of our schema cache queries. Otherwise we risk having the OpenAPI output and the actual behavior of PostgREST being out of sync.

What about something where PostgREST will run some initial queries for each database connection it starts. This could then create some functions in pg_temp. Those would only be valid for the current session. We could create the schema cache queries as temporary functions instead (which would also allow better error handling for some edge cases in the recursive view parsing, btw). The actual "reload the schema cache" will then just execute those functions.

Advantage: If you want to inspect the results of those queries, you can just create a wrapper around such a temporary function:

create function exposed.relationships () returns ...
language sql as $$
  select pg_temp.schema_cache_relationships();
$$;

You can then call this function via regular REST interface. And the openapi spec can also call this function and get results that will always match what the current PostgREST version will see.

This will likely not work for read-replicas, but in that case it's probably better to have some kind of caching for openapi results anyway, because performance seems to be critical in that case.

This will also slightly affect startup time for new connections. One way to counter that could be to allow passing a schema as config option, on which authenticator has CREATE privileges. PostgREST could then use this schema instead of pg_temp to sync those functions just once after startup. In this case it would be even simpler to call the schema cache functions directly via psql, too.

steve-chavez commented 9 months ago

Can you elaborate on this? I don't understand the idea here.

Will make another issue and also copy your message to continue the discussion.

steve-chavez commented 5 months ago

Idea: If we used mustache templating for our queries, we could potentially use https://github.com/PostgREST/plmustache to run them on the db side and solve this issue.

Maybe we could move them to https://hackage.haskell.org/package/stache.

wolfgangwalther commented 4 months ago

Idea: If we used mustache templating for our queries, we could potentially use https://github.com/PostgREST/plmustache to run them on the db side and solve this issue.

This would also mean we'd move away from parametrized queries in that area, because we'd need to substitute the configuration values (db-schemas, etc) into those queries via mustache.

The queries currently use $1 for parameter references. Those map 1:1 to argument names, when those queries are wrapped in a CREATE FUNCTION - so the approach in https://github.com/PostgREST/postgrest/issues/3215#issuecomment-1939213268 will be very smooth.

Maybe we could move them to https://hackage.haskell.org/package/stache.

This package uses template haskell and fails to cross-compile. I would rather not move that way.

PostgREST / postgrest