dalibo / pev2

Postgres Explain Visualizer 2
https://explain.dalibo.com
PostgreSQL License
2.49k stars 119 forks source link

Materalize row count query. #635

Open tolomea opened 4 months ago

tolomea commented 4 months ago

This is version 1.9.1

I don't understand explain output that well, I'm still learning.

I have an explain from Postgres that includes

->  Nested Loop Semi Join  (cost=105120.95..1407009.98 rows=1 width=254) (actual time=9567.961..183600.927 rows=11768 loops=1)
    Join Filter: (REDACTED)
    Rows Removed by Join Filter: 1196949001
    ->  Nested Loop  (cost=104809.76..1404697.63 rows=2798 width=286) (actual time=8337.268..65035.557 rows=205377 loops=1)
    ->  Materialize  (cost=311.19..2010.18 rows=27 width=16) (actual time=0.001..0.292 rows=5828 loops=205377)
        ->  Nested Loop  (cost=311.19..2010.15 rows=27 width=16) (actual time=106.768..1213.133 rows=6000 loops=1)

In the UI these four nodes look like: image

1: The rows and estimated rows on the materialize surprise me they seem to have been multiplied by loops. Is this indicating that the nested loop semi join is reading the whole materialized table 205377 times?

2: I note that that's the number of rows coming up to the other side of the semi join from the nested loop. It's estimated rows is much lower which suggests that if 1 is correct and expected, then maybe for the estimated rows on the materialize it should be multiplying by the estimated rows off the nested loop, not the actual loops from the materialize.

tolomea commented 4 months ago

Also I have another version of the same query where instead of materialize and nested loop semi join it's used a hash and hash semi join. The hash only has a loop count of 1, so it doesn't pump up the rows like this. That makes it look much better, but is it really? As far as I can see the planner has done that because it's row estimates that time were much less accurate leading it to think there would only be 27 rows on the left side of this join, not 205,377. Suggesting that the planner thinks the materialize is better when there are more rows. Although the runtimes contradict that idea so 🤷