dbgroup-at-ucsc / dbtune

This research project aims to develop tools in order to make index tuning easier and more effective.
http://users.soe.ucsc.edu/~alkis/tuning/
Other
5 stars 3 forks source link

The under-estimation of INUM #248

Closed tqtrung closed 12 years ago

tqtrung commented 12 years ago

In some experiments of DivBIP, I realized the total cost metric returned by INUM is lower than the total cost if we run the same set of queries (with the same configuration) on DB2Optimzer. The ratio of totalCost(DB2Optimizer) / totalCost(INUM) is somewhere 1.2 to 1.6. (We expect totalCost(DB2Optimizer) <= totalCost(INUM).

Below includes a specific example for the second query in the TPCDS workload (@ID=TEMPLATE_07):

-- @ID=TEMPLATE_07 SELECT i_item_id, AVG(ss_quantity) agg1, AVG(ss_list_price) agg2, AVG(ss_coupon_amt) agg3, AVG(ss_sales_price) agg4 FROM tpcds.store_sales, tpcds.customer_demographics, tpcds.date_dim, tpcds.item, tpcds.promotion WHERE ss_sold_date_sk = d_date_sk AND ss_item_sk = i_item_sk AND ss_cdemo_sk = cd_demo_sk AND ss_promo_sk = p_promo_sk AND cd_gender = 'F' AND cd_marital_status = 'M' AND cd_education_status = 'College' AND ( p_channel_email = 'N' OR p_channel_event = 'N' ) AND d_year = 2001 GROUP BY i_item_id ORDER BY i_item_id;

[+TPCDS.CUSTOMER.C_CUSTOMER_SK(A)+TPCDS.CUSTOMER.C_CURRENT_ADDR_SK(A)]

[+TPCDS.DATE_DIM.D_YEAR(A)+TPCDS.DATE_DIM.D_MOY(A)+TPCDS.DATE_DIM.D_DATE_SK(A)]

[+TPCDS.ITEM.I_ITEM_ID(A)+TPCDS.ITEM.I_ITEM_SK(A)]

[+TPCDS.CUSTOMER_ADDRESS.CA_ADDRESS_SK(A)+TPCDS.CUSTOMER_ADDRESS.CA_ZIP(A)]

[+TPCDS.DATE_DIM.D_YEAR(A)+TPCDS.DATE_DIM.D_DATE_SK(A)]

[+TPCDS.PROMOTION.P_CHANNEL_EVENT(A)+TPCDS.PROMOTION.P_CHANNEL_EMAIL(A) +TPCDS.PROMOTION.P_PROMO_SK(A)]

[+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_EDUCATION_STATUS(A) +TPCDS.CUSTOMER_DEMOGRAPHICS.CD_MARITAL_STATUS(A) +TPCDS.CUSTOMER_DEMOGRAPHICS.CD_GENDER(A) +TPCDS.CUSTOMER_DEMOGRAPHICS.CD_DEMO_SK(A)]

[+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(A)+TPCDS.STORE_SALES.SS_SALES_PRICE(A) +TPCDS.STORE_SALES.SS_COUPON_AMT(A)+TPCDS.STORE_SALES.SS_LIST_PRICE(A) +TPCDS.STORE_SALES.SS_QUANTITY(A)+TPCDS.STORE_SALES.SS_PROMO_SK(A) +TPCDS.STORE_SALES.SS_CDEMO_SK(A)+TPCDS.STORE_SALES.SS_ITEM_SK(A)]

[+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(A)+TPCDS.STORE_SALES.SS_EXT_SALES_PRICE(A) +TPCDS.STORE_SALES.SS_STORE_SK(A)+TPCDS.STORE_SALES.SS_CUSTOMER_SK(A) +TPCDS.STORE_SALES.SS_ITEM_SK(A)]

tqtrung commented 12 years ago
The first template plan, with the internal plan cost: 36007

  RETURN(cost=76028.078125 rows=0 object=NONE fetch=NONE)
└── GRPBY(cost=76028.046875 rows=0 object=NONE fetch=NONE)
    └── TABLE.SCAN(cost=76028.0390625 rows=0 object=NONE fetch=NONE)
        └── SORT(cost=76028.03125 rows=0 object=NONE fetch=NONE)
            └── HASH.JOIN(cost=76027.96875 rows=0 object=NONE fetch=NONE)
                ├── HASH.JOIN(cost=73528.671875 rows=0 object=NONE fetch=NONE)
                │   ├── HASH.JOIN(cost=36891.484375 rows=0 object=NONE fetch=NONE)
                │   │   ├── NESTED.LOOP.JOIN(cost=36769.70703125 rows=0 object=NONE fetch=NONE)
                │   │   │   ├── TABLE.SCAN(cost=2643.62255859375 rows=0 object=NONE fetch=NONE)
                │   │   │   │   └── SORT(cost=2643.622314453125 rows=0 object=NONE fetch=NONE)
                │   │   │   │       └── TABLE.ACCESS.SLOT(cost=2643.558349609375 rows=73049 object=TPCDS.DATE_DIM fetch=[+TPCDS.DATE_DIM.D_YEAR(A)+TPCDS.DATE_DIM.D_DATE_SK(A)])
                │   │   │   └── TABLE.ACCESS.SLOT(cost=81.6527328491211 rows=2880404 object=[+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(A)] fetch=[+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(D)+TPCDS.STORE_SALES.SS_SALES_PRICE(A)+TPCDS.STORE_SALES.SS_COUPON_AMT(A)+TPCDS.STORE_SALES.SS_LIST_PRICE(A)+TPCDS.STORE_SALES.SS_QUANTITY(A)+TPCDS.STORE_SALES.SS_PROMO_SK(A)+TPCDS.STORE_SALES.SS_CDEMO_SK(A)+TPCDS.STORE_SALES.SS_ITEM_SK(A)])
                │   │   └── TABLE.ACCESS.SLOT(cost=121.15773010253906 rows=300 object=TPCDS.PROMOTION fetch=[+TPCDS.PROMOTION.P_CHANNEL_EVENT(A)+TPCDS.PROMOTION.P_CHANNEL_EMAIL(A)+TPCDS.PROMOTION.P_PROMO_SK(A)])
                │   └── TABLE.ACCESS.SLOT(cost=34675.6953125 rows=1920800 object=TPCDS.CUSTOMER_DEMOGRAPHICS fetch=[+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_EDUCATION_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_MARITAL_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_GENDER(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_DEMO_SK(A)])
                └── TABLE.ACCESS.SLOT(cost=2498.658203125 rows=18000 object=TPCDS.ITEM fetch=[+TPCDS.ITEM.I_ITEM_ID(A)+TPCDS.ITEM.I_ITEM_SK(A)])

 The second template plan with the internal plan cost: 1176

RETURN(cost=95730.2265625 rows=0 object=NONE fetch=NONE)
└── GRPBY(cost=95730.1953125 rows=0 object=NONE fetch=NONE)
    └── TABLE.SCAN(cost=95730.1875 rows=0 object=NONE fetch=NONE)
        └── SORT(cost=95730.1796875 rows=0 object=NONE fetch=NONE)
            └── HASH.JOIN(cost=95730.1171875 rows=0 object=NONE fetch=NONE)
                ├── SORT.MERGE.JOIN(cost=94566.296875 rows=0 object=NONE fetch=NONE)
                │   ├── FILTER(cost=92130.8203125 rows=0 object=NONE fetch=NONE)
                │   │   └── TABLE.SCAN(cost=92130.8203125 rows=0 object=NONE fetch=NONE)
                │   │       └── SORT(cost=91996.7109375 rows=0 object=NONE fetch=NONE)
                │   │           └── HASH.JOIN(cost=91061.8515625 rows=0 object=NONE fetch=NONE)
                │   │               ├── HASH.JOIN(cost=91038.3828125 rows=0 object=NONE fetch=NONE)
                │   │               │   ├── TABLE.ACCESS.SLOT(cost=90924.1875 rows=2880404 object=TPCDS.STORE_SALES fetch=[+TPCDS.STORE_SALES.SS_SALES_PRICE(A)+TPCDS.STORE_SALES.SS_COUPON_AMT(A)+TPCDS.STORE_SALES.SS_LIST_PRICE(A)+TPCDS.STORE_SALES.SS_QUANTITY(A)+TPCDS.STORE_SALES.SS_PROMO_SK(A)+TPCDS.STORE_SALES.SS_CDEMO_SK(A)+TPCDS.STORE_SALES.SS_ITEM_SK(A)+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(A)])
                │   │               │   └── TABLE.ACCESS.SLOT(cost=12.774245262145996 rows=73049 object=[+TPCDS.DATE_DIM.D_YEAR(A)+TPCDS.DATE_DIM.D_DATE_SK(A)] fetch=[+TPCDS.DATE_DIM.D_DATE_SK(D)+TPCDS.DATE_DIM.D_YEAR(A)])
                │   │               └── TABLE.ACCESS.SLOT(cost=22.84453010559082 rows=300 object=[+TPCDS.PROMOTION.P_CHANNEL_EVENT(A)+TPCDS.PROMOTION.P_CHANNEL_EMAIL(A)+TPCDS.PROMOTION.P_PROMO_SK(A)] fetch=[+TPCDS.PROMOTION.P_CHANNEL_EVENT(A)+TPCDS.PROMOTION.P_CHANNEL_EMAIL(A)+TPCDS.PROMOTION.P_PROMO_SK(A)])
                │   └── TABLE.ACCESS.SLOT(cost=2431.15966796875 rows=1920800 object=[+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_EDUCATION_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_MARITAL_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_GENDER(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_DEMO_SK(A)] fetch=[+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_DEMO_SK(D)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_EDUCATION_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_MARITAL_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_GENDER(A)])
                └── TABLE.ACCESS.SLOT(cost=1163.1761474609375 rows=18000 object=[+TPCDS.ITEM.I_ITEM_ID(A)+TPCDS.ITEM.I_ITEM_SK(A)] fetch=[+TPCDS.ITEM.I_ITEM_ID(D)+TPCDS.ITEM.I_ITEM_SK(D)])

 The third template plan with the internal plan cost: 2202

RETURN(cost=133065.40625 rows=0 object=NONE fetch=NONE)
└── GRPBY(cost=133065.375 rows=0 object=NONE fetch=NONE)
    └── TABLE.SCAN(cost=133065.359375 rows=0 object=NONE fetch=NONE)
        └── SORT(cost=133065.34375 rows=0 object=NONE fetch=NONE)
            └── HASH.JOIN(cost=133065.28125 rows=0 object=NONE fetch=NONE)
                ├── SORT.MERGE.JOIN(cost=130565.9765625 rows=0 object=NONE fetch=NONE)
                │   ├── TABLE.SCAN(cost=94859.9140625 rows=0 object=NONE fetch=NONE)
                │   │   └── SORT(cost=94725.8046875 rows=0 object=NONE fetch=NONE)
                │   │       └── HASH.JOIN(cost=93790.9453125 rows=0 object=NONE fetch=NONE)
                │   │           ├── HASH.JOIN(cost=93669.1640625 rows=0 object=NONE fetch=NONE)
                │   │           │   ├── TABLE.ACCESS.SLOT(cost=90924.1875 rows=2880404 object=TPCDS.STORE_SALES fetch=[+TPCDS.STORE_SALES.SS_SALES_PRICE(A)+TPCDS.STORE_SALES.SS_COUPON_AMT(A)+TPCDS.STORE_SALES.SS_LIST_PRICE(A)+TPCDS.STORE_SALES.SS_QUANTITY(A)+TPCDS.STORE_SALES.SS_PROMO_SK(A)+TPCDS.STORE_SALES.SS_CDEMO_SK(A)+TPCDS.STORE_SALES.SS_ITEM_SK(A)+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(A)])
                │   │           │   └── TABLE.ACCESS.SLOT(cost=2643.558349609375 rows=73049 object=TPCDS.DATE_DIM fetch=[+TPCDS.DATE_DIM.D_YEAR(A)+TPCDS.DATE_DIM.D_DATE_SK(A)])
                │   │           └── TABLE.ACCESS.SLOT(cost=121.15773010253906 rows=300 object=TPCDS.PROMOTION fetch=[+TPCDS.PROMOTION.P_CHANNEL_EVENT(A)+TPCDS.PROMOTION.P_CHANNEL_EMAIL(A)+TPCDS.PROMOTION.P_PROMO_SK(A)])
                │   └── FILTER(cost=35421.1015625 rows=0 object=NONE fetch=NONE)
                │       └── TABLE.SCAN(cost=35421.1015625 rows=0 object=NONE fetch=NONE)
                │           └── SORT(cost=35322.1328125 rows=0 object=NONE fetch=NONE)
                │               └── TABLE.ACCESS.SLOT(cost=34675.6953125 rows=1920800 object=TPCDS.CUSTOMER_DEMOGRAPHICS fetch=[+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_EDUCATION_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_MARITAL_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_GENDER(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_DEMO_SK(A)])
                └── TABLE.ACCESS.SLOT(cost=2498.658203125 rows=18000 object=TPCDS.ITEM fetch=[+TPCDS.ITEM.I_ITEM_ID(A)+TPCDS.ITEM.I_ITEM_SK(A)])
npolyzotis commented 12 years ago

It 'd be great if we can include a methodology to reproduce the bug as part of this issue. For instance, having the SQL query text, and then the steps necessary to observe the lower INUM cost.

On Mar 20, 2012, at 9:34 AM, tqtrung wrote:

In some experiments of DivBIP, I realized the total cost metric returned by INUM is lower than the total cost if we run the same set of queries (with the same configuration) on DB2Optimzer. The ratio of totalCost(DB2Optimizer) / totalCost(INUM) is somewhere 1.2 to 1.6. (We expect totalCost(DB2Optimizer) <= totalCost(INUM).

Below includes a specific example for the second query in the workload (@ID=TEMPLATE_07):

  • Configuration: [+TPCDS.ITEM.I_MANAGER_ID(A)+TPCDS.ITEM.I_MANUFACT(A)+TPCDS.ITEM.I_MANUFACT_ID(A)+TPCDS.ITEM.I_BRAND_ID(A)+TPCDS.ITEM.I_BRAND(A)+TPCDS.ITEM.I_ITEM_SK(A)] 11 [+TPCDS.CUSTOMER.C_CUSTOMER_SK(A)+TPCDS.CUSTOMER.C_CURRENT_ADDR_SK(A)] 10 [+TPCDS.DATE_DIM.D_YEAR(A)+TPCDS.DATE_DIM.D_MOY(A)+TPCDS.DATE_DIM.D_DATE_SK(A)] 12 [+TPCDS.ITEM.I_ITEM_ID(A)+TPCDS.ITEM.I_ITEM_SK(A)] 5 [+TPCDS.CUSTOMER_ADDRESS.CA_ADDRESS_SK(A)+TPCDS.CUSTOMER_ADDRESS.CA_ZIP(A)] 9 [+TPCDS.DATE_DIM.D_YEAR(A)+TPCDS.DATE_DIM.D_DATE_SK(A)] 6 [+TPCDS.PROMOTION.P_CHANNEL_EVENT(A)+TPCDS.PROMOTION.P_CHANNEL_EMAIL(A)+TPCDS.PROMOTION.P_PROMO_SK(A)] 4 [+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_EDUCATION_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_MARITAL_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_GENDER(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_DEMO_SK(A)] 7 [+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(A)+TPCDS.STORE_SALES.SS_SALES_PRICE(A)+TPCDS.STORE_SALES.SS_COUPON_AMT(A)+TPCDS.STORE_SALES.SS_LIST_PRICE(A)+TPCDS.STORE_SALES.SS_QUANTITY

(A)+TPCDS.STORE_SALES.SS_PROMO_SK(A)+TPCDS.STORE_SALES.SS_CDEMO_SK(A)+TPCDS.STORE_SALES.SS_ITEM_SK(A)] 8 [+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(A)+TPCDS.STORE_SALES.SS_EXT_SALES_PRICE(A)+TPCDS.STORE_SALES.SS_STORE_SK(A)+TPCDS.STORE_SALES.SS_CUSTOMER_SK(A)+TPCDS.STORE_SALES.SS_ITEM_SK(A)] 13


Reply to this email directly or view it on GitHub: https://github.com/dbgroup-at-ucsc/dbtune/issues/248

tqtrung commented 12 years ago
 The instantiated plan by INUM with COST: 4828.749740600586

RETURN(cost=4828.749740600586 rows=0 object=NONE fetch=NONE)
└── GRPBY(cost=95730.1953125 rows=0 object=NONE fetch=NONE)
    └── TABLE.SCAN(cost=95730.1875 rows=0 object=NONE fetch=NONE)
        └── SORT(cost=95730.1796875 rows=0 object=NONE fetch=NONE)
            └── HASH.JOIN(cost=95730.1171875 rows=0 object=NONE fetch=NONE)
                ├── SORT.MERGE.JOIN(cost=94566.296875 rows=0 object=NONE fetch=NONE)
                │   ├── FILTER(cost=92130.8203125 rows=0 object=NONE fetch=NONE)
                │   │   └── TABLE.SCAN(cost=92130.8203125 rows=0 object=NONE fetch=NONE)
                │   │       └── SORT(cost=91996.7109375 rows=0 object=NONE fetch=NONE)
                │   │           └── HASH.JOIN(cost=91061.8515625 rows=0 object=NONE fetch=NONE)
                │   │               ├── HASH.JOIN(cost=91038.3828125 rows=0 object=NONE fetch=NONE)
                │   │               │   ├── INDEX.SCAN(cost=12.774245262145996 rows=73049 object=[+TPCDS.DATE_DIM.D_YEAR(A)+TPCDS.DATE_DIM.D_DATE_SK(A)] fetch=[+TPCDS.DATE_DIM.D_DATE_SK(D)+TPCDS.DATE_DIM.D_YEAR(A)])
                │   │               │   └── INDEX.SCAN(cost=22.710678100585938 rows=2880404 object=[+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(A)+TPCDS.STORE_SALES.SS_SALES_PRICE(A)+TPCDS.STORE_SALES.SS_COUPON_AMT(A)+TPCDS.STORE_SALES.SS_LIST_PRICE(A)+TPCDS.STORE_SALES.SS_QUANTITY(A)+TPCDS.STORE_SALES.SS_PROMO_SK(A)+TPCDS.STORE_SALES.SS_CDEMO_SK(A)+TPCDS.STORE_SALES.SS_ITEM_SK(A)] fetch=[+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(D)+TPCDS.STORE_SALES.SS_SALES_PRICE(D)+TPCDS.STORE_SALES.SS_COUPON_AMT(D)+TPCDS.STORE_SALES.SS_LIST_PRICE(D)+TPCDS.STORE_SALES.SS_QUANTITY(D)+TPCDS.STORE_SALES.SS_PROMO_SK(D)+TPCDS.STORE_SALES.SS_CDEMO_SK(D)+TPCDS.STORE_SALES.SS_ITEM_SK(D)])
                │   │               └── INDEX.SCAN(cost=22.84453010559082 rows=300 object=[+TPCDS.PROMOTION.P_CHANNEL_EVENT(A)+TPCDS.PROMOTION.P_CHANNEL_EMAIL(A)+TPCDS.PROMOTION.P_PROMO_SK(A)] fetch=[+TPCDS.PROMOTION.P_CHANNEL_EVENT(A)+TPCDS.PROMOTION.P_CHANNEL_EMAIL(A)+TPCDS.PROMOTION.P_PROMO_SK(A)])
                │   └── INDEX.SCAN(cost=2431.15966796875 rows=1920800 object=[+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_EDUCATION_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_MARITAL_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_GENDER(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_DEMO_SK(A)] fetch=[+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_DEMO_SK(D)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_EDUCATION_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_MARITAL_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_GENDER(A)])
                └── INDEX.SCAN(cost=1163.1761474609375 rows=18000 object=[+TPCDS.ITEM.I_ITEM_ID(A)+TPCDS.ITEM.I_ITEM_SK(A)] fetch=[+TPCDS.ITEM.I_ITEM_ID(D)+TPCDS.ITEM.I_ITEM_SK(D)])
tqtrung commented 12 years ago
 The plan instantiated by DB2 with cost: 11924

 RETURN(cost=11924.9111328125 rows=0 object=NONE fetch=NONE)
└── GRPBY(cost=11924.8759765625 rows=0 object=NONE fetch=NONE)
    └── TABLE.SCAN(cost=11924.8662109375 rows=0 object=NONE fetch=NONE)
        └── SORT(cost=11924.85546875 rows=0 object=NONE fetch=NONE)
            └── HASH.JOIN(cost=11924.791015625 rows=0 object=NONE fetch=NONE)
                ├── INDEX.SCAN(cost=1163.1761474609375 rows=18000 object=[+TPCDS.ITEM.I_ITEM_ID(A)+TPCDS.ITEM.I_ITEM_SK(A)] fetch=[+TPCDS.ITEM.I_ITEM_ID(D)+TPCDS.ITEM.I_ITEM_SK(D)])
                └── HASH.JOIN(cost=10760.978515625 rows=0 object=NONE fetch=NONE)
                    ├── HASH.JOIN(cost=10738.11328125 rows=0 object=NONE fetch=NONE)
                    │   ├── NESTED.LOOP.JOIN(cost=8303.818359375 rows=0 object=NONE fetch=NONE)
                    │   │   ├── INDEX.SCAN(cost=12.774245262145996 rows=73049 object=[+TPCDS.DATE_DIM.D_YEAR(A)+TPCDS.DATE_DIM.D_DATE_SK(A)] fetch=[+TPCDS.DATE_DIM.D_DATE_SK(D)+TPCDS.DATE_DIM.D_YEAR(A)])
                    │   │   └── INDEX.SCAN(cost=22.710678100585938 rows=2880404 object=[+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(A)+TPCDS.STORE_SALES.SS_SALES_PRICE(A)+TPCDS.STORE_SALES.SS_COUPON_AMT(A)+TPCDS.STORE_SALES.SS_LIST_PRICE(A)+TPCDS.STORE_SALES.SS_QUANTITY(A)+TPCDS.STORE_SALES.SS_PROMO_SK(A)+TPCDS.STORE_SALES.SS_CDEMO_SK(A)+TPCDS.STORE_SALES.SS_ITEM_SK(A)] fetch=[+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(D)+TPCDS.STORE_SALES.SS_SALES_PRICE(D)+TPCDS.STORE_SALES.SS_COUPON_AMT(D)+TPCDS.STORE_SALES.SS_LIST_PRICE(D)+TPCDS.STORE_SALES.SS_QUANTITY(D)+TPCDS.STORE_SALES.SS_PROMO_SK(D)+TPCDS.STORE_SALES.SS_CDEMO_SK(D)+TPCDS.STORE_SALES.SS_ITEM_SK(D)])
                    │   └── INDEX.SCAN(cost=2431.15966796875 rows=1920800 object=[+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_EDUCATION_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_MARITAL_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_GENDER(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_DEMO_SK(A)] fetch=[+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_DEMO_SK(D)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_EDUCATION_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_MARITAL_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_GENDER(A)])
                    └── INDEX.SCAN(cost=22.84453010559082 rows=300 object=[+TPCDS.PROMOTION.P_CHANNEL_EVENT(A)+TPCDS.PROMOTION.P_CHANNEL_EMAIL(A)+TPCDS.PROMOTION.P_PROMO_SK(A)] fetch=[+TPCDS.PROMOTION.P_CHANNEL_EVENT(A)+TPCDS.PROMOTION.P_CHANNEL_EMAIL(A)+TPCDS.PROMOTION.P_PROMO_SK(A)])
tqtrung commented 12 years ago

I am not sure if you mean this? I've put this in the issue.

This test case can be repeated by running DivBIPTest with: #replicas = 3, load balance factor = 2, # query = 3 (the first 3 queries in tpcds-small),

at most 10 indexes are allowed on each replica.

On Tue, Mar 20, 2012 at 9:47 AM, Neoklis Polyzotis < reply@reply.github.com

wrote:

It 'd be great if we can include a methodology to reproduce the bug as part of this issue. For instance, having the SQL query text, and then the steps necessary to observe the lower INUM cost.

On Mar 20, 2012, at 9:34 AM, tqtrung wrote:

In some experiments of DivBIP, I realized the total cost metric returned by INUM is lower than the total cost if we run the same set of queries (with the same configuration) on DB2Optimzer. The ratio of totalCost(DB2Optimizer) / totalCost(INUM) is somewhere 1.2 to 1.6. (We expect totalCost(DB2Optimizer) <= totalCost(INUM).

Below includes a specific example for the second query in the workload (@ID=TEMPLATE_07):

  • Configuration: [+TPCDS.ITEM.I_MANAGER_ID(A)+TPCDS.ITEM.I_MANUFACT(A)+TPCDS.ITEM.I_MANUFACT_ID(A)+TPCDS.ITEM.I_BRAND_ID(A)+TPCDS.ITEM.I_BRAND(A)+TPCDS.ITEM.I_ITEM_SK(A)] 11 [+TPCDS.CUSTOMER.C_CUSTOMER_SK(A)+TPCDS.CUSTOMER.C_CURRENT_ADDR_SK(A)] 10 [+TPCDS.DATE_DIM.D_YEAR(A)+TPCDS.DATE_DIM.D_MOY(A)+TPCDS.DATE_DIM.D_DATE_SK(A)] 12 [+TPCDS.ITEM.I_ITEM_ID(A)+TPCDS.ITEM.I_ITEM_SK(A)] 5 [+TPCDS.CUSTOMER_ADDRESS.CA_ADDRESS_SK(A)+TPCDS.CUSTOMER_ADDRESS.CA_ZIP(A)] 9 [+TPCDS.DATE_DIM.D_YEAR(A)+TPCDS.DATE_DIM.D_DATE_SK(A)] 6 [+TPCDS.PROMOTION.P_CHANNEL_EVENT(A)+TPCDS.PROMOTION.P_CHANNEL_EMAIL(A)+TPCDS.PROMOTION.P_PROMO_SK(A)] 4 [+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_EDUCATION_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_MARITAL_STATUS(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_GENDER(A)+TPCDS.CUSTOMER_DEMOGRAPHICS.CD_DEMO_SK(A)] 7 [+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(A)+TPCDS.STORE_SALES.SS_SALES_PRICE(A)+TPCDS.STORE_SALES.SS_COUPON_AMT(A)+TPCDS.STORE_SALES.SS_LIST_PRICE(A)+TPCDS.STORE_SALES.SS_QUANTI

TY

(A)+TPCDS.STORE_SALES.SS_PROMO_SK(A)+TPCDS.STORE_SALES.SS_CDEMO_SK(A)+TPCDS.STORE_SALES.SS_ITEM_SK(A)] 8 [+TPCDS.STORE_SALES.SS_SOLD_DATE_SK(A)+TPCDS.STORE_SALES.SS_EXT_SALES_PRICE(A)+TPCDS.STORE_SALES.SS_STORE_SK(A)+TPCDS.STORE_SALES.SS_CUSTOMER_SK(A)+TPCDS.STORE_SALES.SS_ITEM_SK(A)] 13


Reply to this email directly or view it on GitHub: https://github.com/dbgroup-at-ucsc/dbtune/issues/248


Reply to this email directly or view it on GitHub: https://github.com/dbgroup-at-ucsc/dbtune/issues/248#issuecomment-4599791

npolyzotis commented 12 years ago

Trung, the formatting of the plans is a bit weird. Can you try to put them in verbatim mode? You just have to indent them, as follows:

This is in verbatim mode.

Just try to edit my comment and you will see how this formatting works.

Test.