facebook / mysql-5.6

Facebook's branch of the Oracle MySQL database. This includes MyRocks.
http://myrocks.io
Other
2.48k stars 714 forks source link

point lookups much slower in MyRocks than in InnoDB #677

Open mdcallag opened 7 years ago

mdcallag commented 7 years ago

Point lookups are much slower in MyRocks than InnoDB for low concurrency workloads. They also can suffer from mutex contention but I filed https://github.com/facebook/mysql-5.6/issues/674 for that.

Here are results from 3 of the tests I get from sysbench. The tests are point-query, random-points and hot-points.

I tested MyRocks from Feb10 and Jun16, upstream InnoDB 5.6.35, 5.7.17 and 8.0.1. The server is a core i5 NUC with 2 CPUs, 4 HW threads, 16gb of RAM and a fast SSD. The server is shared by sysbench and mysqld. Tests are run for 1 and 2 clients. The database is cached by InnoDB and MyRocks. The test database has 4 tables and 1M rows per table.

From results below:

point-query
1       2       concurrency
15093   27471   myrocks.feb10
12846   23244   myrocks.jun16
20783   36149   inno5635.pt1
18365   32579   inno5717.pt1
16664   30360   inno801.pt1

random-points
1       2       concurrency
 908     1780   myrocks.feb10
 699     1392   myrocks.jun16
3203     6019   inno5635.pt1
3178     6312   inno5717.pt1
3028     6047   inno801.pt1

hot-points
1       2       concurrency
1224     2438   myrocks.feb10
 919     1811   myrocks.jun16
4108     7892   inno5635.pt1
3893     7794   inno5717.pt1
3693     7384   inno801.pt1

I use the all_small.sh script from: https://github.com/mdcallag/mytools/tree/master/bench/sysbench.lua

And sysbench from https://github.com/mdcallag/sysbench

Assuming sysbench is installed at /me/sysbench10 and InnoDB 8.0.2 is at /me/inno802 then this is an example command line.

bash all_small.sh 4 1000000 600 600 300 innodb 1 0 /me/inno802/bin/mysql none /me/sysbench10
mdcallag commented 7 years ago

Using an abbreviated version of the test and my dev server (has more noise from "value added" things running in the background) with the random-points workload I get 750 QPS from MyRocks.Jun16 vs 1820 for InnoDB/5.6.35.

For MyRocks.Jun16 the flat profile is:

     9.98%  my-oneconnectio  mysqld               [.] rocksdb::MemTable::KeyComparator::operator()
     7.09%  my-oneconnectio  mysqld               [.] rocksdb::BlockIter::BinarySeek
     5.19%  my-oneconnectio  mysqld               [.] rocksdb::BlockIter::Seek
     4.50%  my-oneconnectio  libc-2.20.so         [.] __memcmp_sse4_1
     4.02%  my-oneconnectio  mysqld               [.] rocksdb::ThreadLocalPtr::Get
     3.82%  my-oneconnectio  mysqld               [.] rocksdb::LRUCacheShard::Lookup
     3.21%  my-oneconnectio  mysqld               [.] rocksdb::Block::NewIterator
     2.80%  my-oneconnectio  mysqld               [.] rocksdb::(anonymous namespace)::SkipListRep::Get
     2.75%  my-oneconnectio  mysqld               [.] rocksdb::InternalKeyComparator::Compare
     2.61%  my-oneconnectio  mysqld               [.] rocksdb::(anonymous namespace)::FullFilterBitsReader::MayMatch
     1.98%  my-oneconnectio  mysqld               [.] myrocks::Rdb_pk_comparator::Compare
     1.55%  my-oneconnectio  libpthread-2.20.so   [.] pthread_mutex_unlock
     1.54%  my-oneconnectio  mysqld               [.] rocksdb::Version::Get
     1.54%  my-oneconnectio  libc-2.20.so         [.] __memcpy_sse2_unaligned
     1.41%  my-oneconnectio  mysqld               [.] rocksdb::StatisticsImpl::recordTick
     1.36%  my-oneconnectio  mysqld               [.] rocksdb::get_perf_context
     1.02%  my-oneconnectio  mysqld               [.] rocksdb::HistogramStat::Add
     1.00%  my-oneconnectio  mysqld               [.] rocksdb::BlockBasedTable::Get
     0.89%  my-oneconnectio  mysqld               [.] myrocks::Rdb_key_def::unpack_record
     0.78%  my-oneconnectio  [vdso]               [.] 0x0000000000000cb5
     0.75%  my-oneconnectio  mysqld               [.] bmove_upp
     0.69%  my-oneconnectio  mysqld               [.] my_qsort2
     0.61%  my-oneconnectio  mysqld               [.] myrocks::Rdb_key_def::pack_index_tuple
     0.58%  my-oneconnectio  mysqld               [.] SEL_ARG::insert
     0.55%  my-oneconnectio  mysqld               [.] sel_cmp
     0.53%  my-oneconnectio  mysqld               [.] rocksdb::WriteBatchWithIndexInternal::GetFromBatch
     0.51%  my-oneconnectio  libpthread-2.20.so   [.] pthread_mutex_lock
mdcallag commented 7 years ago

adding FORCE INDEX changes QPS for random-points with MyRocks.Jun16 from 750 to 761. I assume that might be noise, so no significant benefit.

siying commented 7 years ago

rocksdb::ThreadLocalPtr::Get() mostly comes from the perf context regression which we have fixed shortly after it was released.

It's not clear to me how much of those binary search costs is from Get() or ApproximateSize(). Is there a way to include the call stack and see that?

mdcallag commented 7 years ago

CPU profiles shared offline. Removal of ThreadLocalPtr in new builds will help a bit, but won't do much to erase 2X to 4X perf difference.

mdcallag commented 7 years ago

QPS increased from 750 to 920 after 'set global rocksdb_force_flush_memtable_now=1' QPS then increased from 920 to 936 after 'set global rocksdb_force_flush_memtable_and_lzero_now=1'

mdcallag commented 7 years ago

This is depends on my helper scripts and my sysbench fork: https://github.com/mdcallag/sysbench https://github.com/mdcallag/mytools/tree/master/bench/sysbench.lua

Short running repro script. Assuming it is all2.sh command line is:

bash all2.sh 8 1000000 60 60 60 rocksdb 1 0 /data/mysql/myrocks/bin/mysql none /data/mysql/sysbench10

And script is:

ntabs=$1
nrows=$2
readsecs=$3
writesecs=$4
insertsecs=$5
engine=$6
setup=$7
cleanup=$8
client=$9
tableoptions=${10}
sysbdir=${11}

concurrency="1 2 4 8 16 24 32 40 48 64"
concurrency="1"

echo update-index
bash run.sh $ntabs $nrows $writesecs $engine $setup 0        update-index    100    $client $tableoptions $sysbdir 16

echo point-query
bash run.sh $ntabs $nrows $readsecs  $engine 0      0        point-query     100    $client $tableoptions $sysbdir $concurrency

echo random-points
bash run.sh $ntabs $nrows $readsecs  $engine 0      0        random-points   100    $client $tableoptions $sysbdir $concurrency

echo hot-points
bash run.sh $ntabs $nrows $readsecs  $engine 0      0        hot-points      100    $client $tableoptions $sysbdir $concurrency