Run queries that depend only on system catalog relations in the mz_introspection cluster

chaas commented 1 year ago

Update: see comments below, this issue now covers running all queries that depend only on system catalog relations in the mz_introspection cluster, with a new session variable for disabling this functionality.

~Blocked by https://github.com/MaterializeInc/materialize/issues/17608.~

Always running SHOW commands in the mz_introspection cluster has a number of benefits: 1) If the customer has gotten into a bad state, e.g. by running SET CLUSTER = <invalid cluster name>, the SHOW command will still return and help the user navigate out of their bad state. This avoids the user getting "stuck". 2) These queries will return faster, because we have default indexes for the SHOW command underlying queries in the mz_introspection cluster.

benesch commented 1 year ago

@aalexandrov reported today on Slack that psql backslash commands (e.g. \d, \dt) are slow: https://materializeinc.slack.com/archives/C02FWJ94HME/p1678714308555629.

That got me thinking that this issue isn't worth tackling as written, because making SHOW commands fast by auto-running them on mz_introspection does nothing to make \d and friends fast.

There's an alternate solution (h/t @aalexandrov) of automatically routing any queries that depend only on system catalog relations (those in mz_catalog, pg_catalog, and mz_internal) to the mz_introspection cluster. On the one hand: this is a bit of an ugly special case. On the other hand: I think it might be the only special case we'd need, and it's straightforward enough to explain. The special case would cover \d and friends, SHOW commands, the introspection queries run by dbt... and probably introspection queries run by basically any other PostgreSQL tool.

I think we'd need to provide a way to opt out of this—perhaps a session variable like force_introspection_cluster that defaults to true?—but the more I think about it the more I like it. It would allow us to keep nested SHOW commands (i.e., abandon #17608), because e.g. SELECT name FROM (SHOW TABLES) would still be run on mz_introspection and be fast.

aalexandrov commented 1 year ago

Note that this might affect observability metrics that currently rely on recurring SHOW as an evidence for system health. If this is implemented one would not be able to infer that the default cluster is up and running from that.

cc @jpepin / @umanwizard

jpepin commented 1 year ago

Our primary environment health metric, mz_views_query_successful, is used to determine if an environment is able to respond to sql and talk to s3 by running SELECT count(*), 1 AS mz_views_query_successful FROM mz_views. @umanwizard would the utility of this metric be affected by the changes being discussed?

benesch commented 1 year ago

Our primary environment health metric, mz_views_query_successful, is used to determine if an environment is able to respond to sql and talk to s3 by running SELECT count(*), 1 AS mz_views_query_successful FROM mz_views. @umanwizard would the utility of this metric be affected by the changes being discussed?

That's a great call out. We'd want to specifically ensure that the prom SQL exporter sets force_introspection_cluster = false, in order to preserve the current behavior of that query.

umanwizard commented 1 year ago

Yes, thanks for raising it @jpepin

chaas commented 1 year ago

Note for implementation: @benesch used the session variable name force_introspection_cluster as a placeholder, and we should think thru what the best name is for this variable

Also: consider using LaunchDarkly for rolling this out?

umanwizard commented 1 year ago

Raising a point that I talked about in person with @chaas , @jkosh44 and @parker-timmerman : currently it will not be enough to check whether a query only depends on values in particular schemas in order to determine whether it is an "introspection query" that should be automatically run on mz_introspection. The reason is that there are several relations in mz_internal that must be queried on particular clusters/replicas: notably the per-replica introspection sources. These have different values on every replica and so issuing them all on mz_introspection does not make sense.

On the other hand, we can't exclude mz_internal from the "should be run on the introspection cluster" logic, since there are some things in there that this logic should in fact apply to, e.g. mz_cluster_replica_utilization.

Cc @teskje who has more context on the ideas around how to distinguish these relations from the "normal" non-per-replica relations.

AFAIK there are roughly three ideas for distinguishing these objects from normal objects:

Put them all in their own schema
Prepend them with a particular string
Force the user to query them with a special SQL syntax, e.g. SELECT IN REPLICA .... (This is my personal preference).

benesch commented 1 year ago

@umanwizard that's just a documentation/explanation issue though, right? In the coordinator we already know what's a replica-local relation and what's not (see check_no_invalid_log_reads), and so we should already be able to write the necessary code. I think the rule is:

If a query depends a) only on relations in mz_catalog and mz_internal and b) no relations that are replica local, and the auto_run... variable is true, run the query on the mz_introspection cluster instead of the active cluster.

umanwizard commented 1 year ago

If we are able to distinguish that in the code then it's fine. I'm just pointing out that using schema membership to distinguish them is not sufficient. @parker-timmerman can you check in your PR whether we can use the strategy Nikhil proposed?

do-not-use-parker-timmerman commented 1 year ago

This all totally makes sense, thanks for the context @umanwizard and @benesch!

I'm struggling a bit to see how check_no_invalid_log_reads determines if a relation is replica-local though. It seems like what we're checking is if we have no log sources, more than one replica, or if there is not specific target replica set, then we'll bail. I don't know what log sources are though, here they're collected from a method called arranged_introspection_dependencies which seems to do DFS to find all CatalogItem::Log we transitively depend on. Maybe this is what I'm missing?

To add to Brennan's list above could a 4th way to detect this to be adding a new field to BuiltinView something like is_replica_local?

umanwizard commented 1 year ago

Yes, "arranged introspection sources", CatalogItem::Log, and "per-replica objects" are all basically the same concept. Sorry that the naming around them is so messy.

teskje commented 1 year ago

FWIW, there is now a way to collect all log/introspection/per-replica relations in SQL:

WITH MUTUALLY RECURSIVE
    log_ids (id text) AS (
        SELECT id FROM mz_sources WHERE type = 'log'
        UNION
        (
            SELECT object_id AS id
            FROM mz_internal.mz_object_dependencies
            WHERE referenced_object_id IN (SELECT * FROM log_ids)
        )
    )
SELECT * FROM log_ids;

It requires WMR so cannot be used in production yet, but maybe still instructive.

But yes, as you have already found out, in Rust you can just use arranged_introspection_dependencies, which does the same recursive lookup.

do-not-use-parker-timmerman commented 1 year ago

Yes, "arranged introspection sources", CatalogItem::Log, and "per-replica objects" are all basically the same concept. Sorry that the naming around them is so messy.

Great! Thanks for the help @umanwizard! I updated #18310 to use the arranged_introspection_dependencies method to check if anything we depend on is per-replica, and update the tests to exercise these scenarios.

do-not-use-parker-timmerman commented 1 year ago

Got auto-closed by accident

MaterializeInc / materialize

Run queries that depend only on system catalog relations in the mz_introspection cluster #17609