[Epic] Make `EXPLAIN` powerful enough to easily troubleshoot

The Q2 Roadmap for this longer-term Epic is to do the following changes in the MIR-based plan:

Replace "arity" with "number of columns"
Change the "#" indicator with the name of the column. If the same column name shows up from different tables, prepend with the "." syntax
If we can get an estimate of the number of rows that will be queried from table/view/etc, add that information to the plan.
Expose when differential joins are part of the plan, possibly with some kind of syntax highlighting or outright explanation that additional memory use is involved in this kind of join
Eliminate showing the ArrangeBy that is part of a ReadIndex
Come up with a design that will convey long plans in a much shortened form, possibly with the short form being default, and the long form exposed with use of a "verbose" option
Mark min/max and topk that it's expensive. (maybe surround with ***, maybe collect them at the top or end, or explain insights )
Creating index on something that's filtered and/or projected. Take this into account in plan insights and/or write it down somewhere how to figure out whether you should create the index.
est 3 weeks
aalexandrov commented 4 months ago
- Replace "arity" with "number of columns"
Can do!
- Change the "#" indicator with the name of the column. If the same column name shows up from different tables, prepend with the "." syntax
This was the case before, but it was changed to the current syntax in #22959 in order to address some internal feedback from Frank. The main reason is that column names (even if they are fully qualified) can be ambiguous.
- If we can get an estimate of the number of rows that will be queried from table/view/etc, add that information to the plan.
We can, but this requires some additional plumbing of storage statistics through the optimizer. Also, currently our cardinality estimation is quite poor. Overall I think investing in this might take disproportionate time from the allocated three weeks with questionable utility.
- Expose when differential joins are part of the plan, possibly with some kind of syntax highlighting or outright explanation that additional memory use is involved in this kind of join
I suggest to add a "Insights" section at the end of the plan (after the "Notices" section) that includes this information. We should exclude insights for differential joins with two inputs (they cannot be delta joins so we can't do better). For example:
```
Insights:
- Potentially memory-expensive differential join with and 4 inputs.
```
Alternatively, we can make this an optimizer notice and render it under Notices.
- Eliminate showing the ArrangeBy that is part of a ReadIndex
Can do!
- Come up with a design that will convey long plans in a much shortened form, possibly with the short form being default, and the long form exposed with use of a "verbose" option.
I'm not entirely sure how to do that. The short form is EXPLAIN RAW PLAN (which renders an HIR plan). Once we lower to MIR we lower a lot of high-level constructs into blobs of MIR nodes which makes the plans with outer joins and global aggregates much longer. There were previous attempts to add more operators to MIR but I don't think we have consensus that this is a good idea.
- Mark min/max and topk that it's expensive. (maybe surround with ***, maybe collect them at the top or end, or explain insights )
As with differential joins, I propose to make this part of an "Insights" section at the end of the plan:
```
Insights:
- Potentially memory-expensive `max` aggregation.
```
Alternatively, we can make this an optimizer notice and render it under Notices.
- Creating index on something that's filtered and/or projected. Take this into account in plan insights and/or write it down somewhere how to figure out whether you should create the index.
Can you expand on this? Do you mean suggesting indexes that would speed up the implementation of a Filter or a Join?

est 3 weeks

I doubt that we can fit all of this work over three weeks. I'll start with the two tasks that can be immediately implemented, and we'll have to revisit the estimates of the rest once we have a better idea and broad agreement how to approach them.
ggevay commented 4 months ago

Eliminate showing the ArrangeBy that is part of a ReadIndex

I'd like to take this one, because I'll be modifying related code in the next few weeks.

lfest commented 4 months ago

There's a lot of good stuff here to respond to but at the moment what I wanted to say is that I really, really like the idea of an "Insights" section at the end. Let me run that by Field Eng, too, and see what they think.

lfest commented 4 months ago

Or actually maybe we put Insights at the beginning. That gives a preview of what to look for/focus on. Anyways, let's see what others say.
- © Githubissues.
- Githubissues is a development platform for aggregating issues.

MaterializeInc / materialize

[Epic] Make `EXPLAIN` powerful enough to easily troubleshoot #26919

Product outcome