facebook / mysql-5.6

Facebook's branch of the Oracle MySQL database. This includes MyRocks.
http://myrocks.io
Other
2.48k stars 712 forks source link

MyRocks 8.0.28 has poor performance of primary key range query #1315

Open yeqing12 opened 1 year ago

yeqing12 commented 1 year ago

the problem phenomenon:

截屏2023-06-05 下午3 24 27

Reproduction method:

$ sysbench --version
sysbench 1.0.17

use sysbench create one table, insert 100w rows

sysbench /usr/share/sysbench/oltp_insert.lua --mysql-host=xx --mysql-port=xx --db-driver=mysql --mysql-user=xx --mysql-password=xx --tables=1 --table-size=1000000 --threads=8 --time=600 --report-interval=10 --mysql_storage_engine=rocksdb prepare

sql: select id from sbtest1 where id >=300000 order by id asc limit 1; // poor select id,pad from sbtest1 where id >=300000 order by id asc limit 1; // ok

innodb has no problem

the relational code is: for rocksdb

// for rocksdb, the cost of range scan in ha_rocksdb.cc
double ha_rocksdb::read_time(uint index, uint ranges, ha_rows rows) {
  DBUG_ENTER_FUNC();

  if (index != table->s->primary_key) {
    /* Non covering index range scan */
    DBUG_RETURN(handler::read_time(index, ranges, rows));
  }

  DBUG_RETURN((rows / 20.0) + 1);
}

for sql optimizer: in sql/range_optimizer/range_optimizer.cc

// calc the cost of range scan, compare full index cost and range scan cost
int test_quick_select()
{
   ...
     /* Calculate cost of full index read for the shortest covering index */
  if (!table->covering_keys.is_clear_all() &&
      /*
        If optimizer_force_index_for_range is on and force index is used,
        then skip calculating index scan cost.
      */
      !(thd->variables.optimizer_force_index_for_range && table->force_index)) {
    int key_for_use = find_shortest_key(table, &table->covering_keys);
    // find_shortest_key() should return a valid key:
    assert(key_for_use != MAX_KEY);

    Cost_estimate key_read_time = param.table->file->index_scan_cost(
        key_for_use, 1, static_cast<double>(records));
    key_read_time.add_cpu(
        cost_model->row_evaluate_cost(static_cast<double>(records)));

    bool chosen = false;
    if (key_read_time < cost_est) {
      cost_est = key_read_time;
      chosen = true;
    }

    Opt_trace_object trace_cov(trace, "best_covering_index_scan",
                               Opt_trace_context::RANGE_OPTIMIZER);
    trace_cov.add_utf8("index", table->key_info[key_for_use].name)
        .add("cost", key_read_time)
        .add("chosen", chosen);
    if (!chosen) trace_cov.add_alnum("cause", "cost");
  }

  AccessPath *best_path = nullptr;
  double best_cost = cost_est.total_cost();
  ...
}

I have tried two different approaches to solve this problem, But I'm not sure if these two directions are suitable or if they will bring other problems. I hope the official can come up with a solution or help evaluate these two solutions.

the first way: Do not use full index read to eliminate range scan

--- a/sql/range_optimizer/range_optimizer.cc
+++ b/sql/range_optimizer/range_optimizer.cc
@@ -616,7 +616,6 @@ int test_quick_select(THD *thd, MEM_ROOT *return_mem_root,
   const bool index_merge_intersect_allowed =
       index_merge_allowed &&
       thd->optimizer_switch_flag(OPTIMIZER_SWITCH_INDEX_MERGE_INTERSECT);

   /* Calculate cost of full index read for the shortest covering index */
   if (!table->covering_keys.is_clear_all() &&
       /*
@@ -624,7 +623,7 @@ int test_quick_select(THD *thd, MEM_ROOT *return_mem_root,
         then skip calculating index scan cost.
       */
       !(thd->variables.optimizer_force_index_for_range && table->force_index)) {
-    int key_for_use = find_shortest_key(table, &table->covering_keys);
+/*    int key_for_use = find_shortest_key(table, &table->covering_keys);
     // find_shortest_key() should return a valid key:
     assert(key_for_use != MAX_KEY);

@@ -645,8 +644,8 @@ int test_quick_select(THD *thd, MEM_ROOT *return_mem_root,
         .add("cost", key_read_time)
         .add("chosen", chosen);
     if (!chosen) trace_cov.add_alnum("cause", "cost");
+*/
   }

the second way: change the cost of range scan

--- a/storage/rocksdb/ha_rocksdb.cc
+++ b/storage/rocksdb/ha_rocksdb.cc
@@ -18245,7 +18245,7 @@ double ha_rocksdb::read_time(uint index, uint ranges, ha_rows rows) {
     DBUG_RETURN(handler::read_time(index, ranges, rows));
   }

-  DBUG_RETURN((rows / 20.0) + 1);
+  DBUG_RETURN((rows / 100.0) + 1);
 }

The above are just two simple ways to represent two solutions, after testing, both of them can solve the above problem

截屏2023-06-05 下午4 04 51
luqun commented 1 year ago

Thanks for reporting. We will investigate the issue and give update later.