cmu-db / peloton

The Self-Driving Database Management System
http://pelotondb.io
Apache License 2.0
2.04k stars 622 forks source link

Invalid Index Scan Plan Selection with Multi-column Indexes #1299

Open vkonagar opened 6 years ago

vkonagar commented 6 years ago

As part of our class project "Index Selection", we are trying to generate multi-column hypothetical indexes and checking the cost of using them for a query.

The optimizer does not select a valid plan with multi-column indexes. If we create a multiple-column index say Index1 on columns say (a, b, c) on a table T, and if a query has a where clause on only (b), then the optimizer chooses Index1 in its best plan for the query. Same goes with (c) as well. Ideally, Index1 should only help queries with (a) or (a, b) or (a, b, c) as the where clause predicates. Surprisingly, even with the wrong plan, the query executed successfully.

Steps to reproduce:

GustavoAngulo commented 6 years ago

https://github.com/cmu-db/peloton/blob/78b85dc21d5f1d7b09b150380ecec22b92ea804e/src/optimizer/rule_impls.cpp#L241-L253

This is probably the cause of the issue. The Check function essentially checks if we can use an index scan for a "Get" operator. We're only checking the existence of an index, not if we can actually use the index if there is one. I can probably take a look.

vkonagar commented 6 years ago

Gustavo, Thanks! I looked at the code you pointed and I think the issue is here.

https://github.com/cmu-db/peloton/blob/78b85dc21d5f1d7b09b150380ecec22b92ea804e/src/optimizer/rule_impls.cpp#L394

Here, we are just checking if the given column is present in the column set of the index object but not enforcing any order in the matches. Instead, we should take the vector of column oids present in index catalog object (this preserves multi-column order as specified by the user) and check in the given order if there is any match with the columns given in the query predicates.

Something like this?

// Find match index for the predicates
auto index_objects = get->table->GetIndexObjects();
for (auto &index_id_object_pair : index_objects) {
  auto &index_id = index_id_object_pair.first;
  auto &index_object = index_id_object_pair.second;
  std::vector<oid_t> index_key_column_id_list;
  std::vector<ExpressionType> index_expr_type_list;
  std::vector<type::Value> index_value_list;

  // Only pick the index if the query columns match the index's columns in order.
  auto index_id_list = index_object->GetKeyAttrs();
  for (size_t offset = 0; (offset < key_column_id_list.size()) && (offset < index_id_list.size()); offset++) {
    auto col_id = key_column_id_list[offset];
    if (index_id_list[offset] == col_id) {
      index_key_column_id_list.push_back(col_id);
      index_expr_type_list.push_back(expr_type_list[offset]);
      index_value_list.push_back(value_list[offset]);
    } else {
      break;
    }
  }

  // Add transformed plan
  if (!index_key_column_id_list.empty()) {
    auto index_scan_op = PhysicalIndexScan::make(
        get->get_id, get->table, get->table_alias, get->predicates,
        get->is_for_update, index_id, index_key_column_id_list,
        index_expr_type_list, index_value_list);
    transformed.push_back(
        std::make_shared<OperatorExpression>(index_scan_op));
  }
}

This seems to work.