[Epic] Plan Profiling - Githubissues

aalexandrov commented 11 months ago

Product outcome

Currently, internal arrangement sizes are reported only in relation to the rendered dataflow graph that creates them. Consequently, it is hard to figure out which parts of a SQL query are the main contributors for excessive memory consumption. The current approach is to infer this manually by looking at the rendered dataflow graph. This is time consuming, error-prone, and in general impossible for people that are not well-versed in our rendering code. This approach does not scale, and at the moment is effectively limited to the members of the @MaterializeInc/compute team.

Automatically relating internal arrangements to SQL-level syntax fragments is hard, but relating them to the MirRelationExpr (sub-)terms produced in the OPTIMIZED PLAN stage is a tractable problem. As an outcome of this epic, Materialize customers as well as field engineering and engineering ambassadors (who are the more likely first consumers of this feature) should be able to run

EXPLAIN WITH(memory_usage) MATERIALIZED VIEW $item;
EXPLAIN WITH(memory_usage) INDEX $item;

commands on existing materialized view and index items. The output will show the cumulative memory consumption (reported as the sum of the arrangement sizes) associated with each MirRelationExpr in the OPTIMIZED PLAN for the explained item. For example

Explained Query:
  Project (#0..=#24, #26..=#29) // { memory_usage: 0 }
    Join on=(f1_d1 = d1_k1) type=differential // { memory_usage: 0 }
      ArrangeBy keys=[[f1_d1]] // { memory_usage: 153154 }
        Filter (f1_p01 = 5) AND (f1_d1) IS NOT NULL // { memory_usage: 0 }
          ReadIndex on=f1 f1_pk=[*** full scan ***] // { memory_usage: 0 }
      ReadIndex on=d1 d1_pk=[differential join] // { memory_usage: 0 }

Used Indexes:
  - materialize.public.f1_pk (*** full scan ***)
  - materialize.public.d1_pk (differential join)

Edit: Also, we'd like to show not just memory usage, but CPU usage, skewness, hydration state, input size in rows or bytes, etc. Basically, all the things that are shown in introspection relations at the level of dataflow operators, we'd like to map back to the MIR plan.

Discovery

To achieve the outcome of the epic we need to solve the following sub-problems:

Pass lineage from the MIR expressions produced by the OPTIMIZED PLAN to the LIR expressions in the MIR ⇒ LIR lowering step.
Make this lineage visible in the rendered dataflow graphs.
Expose the lineage in our mz_internal tables in a way which allows us to attach this information to the mz_arrangement_sizes data.
Craft a SQL query that extracts arrangement sizes for a specific dataflow. Use the results of that query to seed an Attribute that can be requested with an ExplainConfig flag.

Some of the sub-problems have different viable solutions and we will need to motivate our choice (probably in a design doc).

Work items

Now

- [ ] Define a new `MirId` attribute that attaches to each node a unique number (e.g. a its post-visit number).
- [ ] Extend the LIR structure to have an extra field for keeping track of the `MirId` that produced it.
- [ ] Derive and pass the attribute in the `mir_to_lir` lowering calls.
- [ ] Change rendering in order to reflect the lineage IDs in the generated graphs (for example by using named regions for each operator). This is possibly out of scope for this ticket.

Related release blockers

Related EXPLAIN output corners that need to be polished before we can ship the feature

- [ ] #22064

Decision log

6 October 2023. Issue converted into an epic.
20 September 2023. Issue created.

vmarcos commented 11 months ago

As part of this work, we may want to additionally consider the following:

[ ] Adding an annotation to each TopK node regarding whether it is implementing a distinct on or a limit.

This would allow us to have EXPLAIN output that looks like the following:

CREATE TABLE teachers (id INT, name TEXT);
CREATE TABLE sections (id INT, teacher_id INT, course_id INT, schedule TEXT);

-- Illustrates that new hints apply without ambiguity in a single query block.
CREATE MATERIALIZED VIEW distinct_on_group_by_limit AS
  SELECT DISTINCT ON(teacher_id) id, teacher_id, MAX(course_id)
  FROM sections
  GROUP BY id, teacher_id
  OPTIONS (
      AGGREGATE INPUT GROUP SIZE = 1000,
      DISTINCT ON INPUT GROUP SIZE = 60,
      LIMIT INPUT GROUP SIZE = 50)
  ORDER BY teacher_id, id
  LIMIT 2;

EXPLAIN PLAN FOR MATERIALIZED VIEW distinct_on_group_by_limit;

-- With lineage, we could improve the EXPLAIN output as follows:
CREATE TABLE
CREATE TABLE
CREATE MATERIALIZED VIEW
                                  Optimized Plan                                  
-------------------------------------------------------------------------------------
 materialize.public.distinct_on_group_by_limit:                                     +
   TopK order_by=[#1 asc nulls_last, #0 asc nulls_last] limit=2 input_group_size=50 +
     TopK group_by=[#1] order_by=[#0 asc nulls_last] distinct on input_group_size=60+
       Reduce group_by=[#0, #1] aggregates=[max(#2)] input_group_size=1000          +
         Project (#0..=#2)                                                          +
           ReadStorage materialize.public.sections                                  +

(1 row)

aalexandrov commented 11 months ago

@vmarcos isn't distinct on just the same as limit=1 in the presence of group_by?

vmarcos commented 11 months ago

@vmarcos isn't distinct on just the same as limit=1 in the presence of group_by?

@aalexandrov It's not a bidirectional mapping, unfortunately. Consider the following example:

CREATE TABLE teachers (id INT, name TEXT);
CREATE TABLE sections (id INT, teacher_id INT, course_id INT, schedule TEXT);

CREATE MATERIALIZED VIEW alt_section_of_top_course_per_teacher AS
SELECT id AS teacher_id, 
       (SELECT id AS section_id 
        FROM sections
        WHERE teacher_id = teachers.id
        OPTIONS (LIMIT INPUT GROUP SIZE = 1000)
        ORDER BY course_id DESC
        LIMIT 1)
FROM teachers;

EXPLAIN PLAN FOR MATERIALIZED VIEW alt_section_of_top_course_per_teacher;

                                    Optimized Plan                                     
---------------------------------------------------------------------------------------
 materialize.public.alt_section_of_top_course_per_teacher:                            +
   Return                                                                             +
     Project (#0, #2)                                                                 +
       Join on=(#0 = #1) type=differential                                            +
         ArrangeBy keys=[[#0]]                                                        +
           Get l0                                                                     +
         ArrangeBy keys=[[#0]]                                                        +
           Union                                                                      +
             Get l2                                                                   +
             Map (null)                                                               +
               Union                                                                  +
                 Negate                                                               +
                   Project (#0)                                                       +
                     Get l2                                                           +
                 Get l1                                                               +
   With                                                                               +
     cte l2 =                                                                         +
       Project (#0, #1)                                                               +
         TopK group_by=[#0] order_by=[#2 desc nulls_first] limit=1 exp_group_size=1000+
           Project (#0, #1, #3)                                                       +
             Filter (#0) IS NOT NULL                                                  +
               Join on=(#0 = #2) type=differential                                    +
                 ArrangeBy keys=[[#0]]                                                +
                   Get l1                                                             +
                 ArrangeBy keys=[[#1]]                                                +
                   Project (#0..=#2)                                                  +
                     Filter (#1) IS NOT NULL                                          +
                       ReadStorage materialize.public.sections                        +
     cte l1 =                                                                         +
       Distinct group_by=[#0]                                                         +
         Get l0                                                                       +
     cte l0 =                                                                         +
       Project (#0)                                                                   +
         ReadStorage materialize.public.teachers                                      +
                                                                                      +
 Source materialize.public.sections                                                   +
   filter=((#1) IS NOT NULL)                                                          +

(1 row)

antiguru commented 11 months ago

Briefly discussed this epic w/@teskje. We believe that this epic requires a clearer product story:

Querying the introspection data as part of an explain might be surprising, specifically because we'd need to construct a dataflow to collect the data.
The data is specific to a replica, so the explain should mention which replica to query.
We could log a subset of the introspection data through environmentd, which would solve querying introspection data. However, it'd be better to have a general solution instead of a specific for this issue.

Other than that, we think it'd be a nice addition to Mz to surface size information more widely.

lfest commented 4 months ago

Per Q2 Roadmap (i.e. planning) discussions at the on-site:

Compute team agrees there are still unknowns to iron out
The final piece (moving the data out to where it can be queried/used) depends on Unified Introspection to have been implemented
Part of the implementation will require moving IDs from the Optimized Plan's nodes to the Physical Plan's nodes
Memory reporting will be implemented at the level of Physical Plan nodes
ArrangeBy has multiple operators so there will be multiple numbers for that

est: ~4~ 5 weeks

lfest commented 4 months ago

Aargh. The estimate was 5 weeks, had a typo.

ggevay commented 4 months ago

Querying the introspection data as part of an explain might be surprising, specifically because we'd need to construct a dataflow to collect the data.

This would have complicated things, but fortunately in the meantime Unified Introspection will solve this: we'll just need to read from Persist. See 2. above.

Compute team agrees there are still unknowns to iron out

Some design questions off the top of my head:

How to specify the replica? Maybe the same way as replica-targeted queries.
How fine-grained should the shown info be? E.g.:
- Hierarchical aggregations: every stage or just the total?
- Differential join intermediate arrangements?
- ArrangeBy with multiple keys.
- Final index arrangement.
How to emphasize big ones? Maybe collect at the end of explain? ***31 GB***?

Memory reporting will be implemented at the level of Physical Plan nodes

We are planning to show it on the MIR plan ("optimized plan", which the default EXPLAIN shows), because that is much more readable, and easier for the user to map back to SQL.

Part of the implementation will require moving IDs from the Optimized Plan's nodes to the Physical Plan's nodes

It's actually the other way around: We'll have to map back from LIR node ids to MIR node ids. Also, it turns out that we don't yet have a mapping even from rendered node ids to LIR, so we need to also do that here.

ggevay commented 3 months ago

I've renamed this to "Plan Profiling", because we want to show not just memory usage, but other things, e.g., CPU usage.

ggevay commented 3 months ago

Discussion on why rely on Unified Introspection for this: https://materializeinc.slack.com/archives/C074FURKP8W/p1718293082913619?thread_ts=1718127372.822979&cid=C074FURKP8W

ggevay commented 1 month ago

https://github.com/MaterializeInc/materialize/pull/27316 is a draft PR from Alex implementing a part of this.

MaterializeInc / materialize

[Epic] Plan Profiling #21835

Product outcome

Discovery

Work items

Now

Related release blockers

Next

Decision log