cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.18k stars 3.82k forks source link

Performance of Query 7 on TPC-H Benchmark #135001

Open bajinsheng opened 1 week ago

bajinsheng commented 1 week ago

Describe the problem

For query 7 in TPC-H benchmark:

select
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
from
    (
        select
            n1.n_name as supp_nation,
            n2.n_name as cust_nation,
            extract(
                year
                from
                    l_shipdate
            ) as l_year,
            l_extendedprice * (1 - l_discount) as volume
        from
            SUPPLIER,
            LINEITEM,
            ORDERS,
            CUSTOMER,
            NATION n1,
            NATION n2
        where
            s_suppkey = l_suppkey
            and o_orderkey = l_orderkey
            and c_custkey = o_custkey
            and s_nationkey = n1.n_nationkey
            and c_nationkey = n2.n_nationkey
            and (
                (
                    n1.n_name = 'JAPAN'
                    and n2.n_name = 'INDIA'
                )
                or (
                    n1.n_name = 'INDIA'
                    and n2.n_name = 'JAPAN'
                )
            )
            and l_shipdate between date '1995-01-01'
            and date '1996-12-31'
    ) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year;

Its execution time is 15.7s, as shown in its query plan original.txt

I found that negatingthe following IF code block brings a significant performance improvement:

diff --git a/pkg/sql/opt/memo/statistics_builder.go b/pkg/sql/opt/memo/statistics_builder.go
index 5e3bd1d1a5e..bf9edfab2b2 100644
--- a/pkg/sql/opt/memo/statistics_builder.go
+++ b/pkg/sql/opt/memo/statistics_builder.go
@@ -1404,7 +1404,7 @@ func (sb *statisticsBuilder) buildJoin(

        default:
                s.RowCount = leftStats.RowCount * rightStats.RowCount
-               if h.rightProps.FuncDeps.ColsAreStrictKey(h.selfJoinCols) {
+               if !h.rightProps.FuncDeps.ColsAreStrictKey(h.selfJoinCols) {
                        // This is like an index join, so apply a selectivity that will result
                        // in leftStats.RowCount rows.
                        if rightStats.RowCount != 0 {

Its execution time is reduced to 5.4s, as shown in the new query plan derived.txt

I am not proposing a fixing patch. Instead, I believe this case shows that the optimizer produces a more efficient query plan based on the current implementation. I wonder whether we can optimize the code anywhere to enable the second query plan.

To Reproduce

cockroach start-single-node --insecure --store=/app/data --listen-addr=0.0.0.0:36257 --sql-addr=0.0.0.0:26257
cockroach workload init tpch

Then execute the above query.

Environment:

Jira issue: CRDB-44256

Jira issue: CRDB-44329

blathers-crl[bot] commented 1 week ago

Hi @bajinsheng, please add branch-* labels to identify which branch(es) this C-bug affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] commented 1 week ago

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

If we have not gotten back to your issue within a few business days, you can try the following:

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

mw5h commented 6 days ago

Thanks for this report!