facebook / mysql-5.6

Facebook's branch of the Oracle MySQL database. This includes MyRocks.
http://myrocks.io
Other
2.48k stars 713 forks source link

How to determine what engine is doing during a stall? #369

Closed greenlion closed 7 years ago

greenlion commented 7 years ago

I am loading five tables as part of the star schema benchmark.

I got an unusual stall loading one of the tables: -- fact table is expected to take a long time to load mysql> load data infile '/home/justin/src/ssb-dbgen/lineorder.tbl' into table lineorder fields terminated by '|' lines terminated by '|\n' ; Query OK, 119994608 rows affected (3 hours 42 min 11.51 sec) Records: 119994608 Deleted: 0 Skipped: 0 Warnings: 0

-- this load was fast mysql> load data infile '/home/justin/src/ssb-dbgen/date.tbl' into table dim_date fields terminated by '|' lines terminated by '|\n' ;Query OK, 2556 rows affected (0.04 sec) Records: 2556 Deleted: 0 Skipped: 0 Warnings: 0

-- this was about 5x slower than expected mysql> load data infile '/home/justin/src/ssb-dbgen/part.tbl' into table part fields terminated by '|' lines terminated by '|\n' ; Query OK, 1000000 rows affected (1 min 2.25 sec) Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0

-- then this one was fast again (though there was some delay between the prior and this one) mysql> load data infile '/home/justin/src/ssb-dbgen/supplier.tbl' into table supplier fields terminated by '|' lines terminated by '|\n' ; Query OK, 40000 rows affected (0.41 sec) Records: 40000 Deleted: 0 Skipped: 0 Warnings: 0

I tried loading the part table a second time, and it was fast:

mysql> rename table part to part2; Query OK, 0 rows affected (0.01 sec)

mysql> create table part like part2; Query OK, 0 rows affected (0.02 sec)

-- this time it loads faster mysql> load data infile '/home/justin/src/ssb-dbgen/part.tbl' into table part fields terminated by '|' lines terminated by '|\n' ; Query OK, 1000000 rows affected (15.17 sec) Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0

Do you have any hints about where to look / what to examine if I see this again? pmp?

mdcallag commented 7 years ago

From SHOW ENGINE ROCKSDB STATUS. There is some detail, but not enough, in https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide

This is per column family and some of the counters are probably non-zero for you.

Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
yoshinorim commented 7 years ago

We'll update the "data loading" wiki after fixing all known bulk loading issues, but general recommendation is all secondary indexes should be added after loading data. In your case, consider the followings.

SET session rocksdb_bulk_load=1;
CREATE TABLE lineorder ... (primary key only)
LOAD DATA ...
ALTER TABLE lineorder ADD INDEX ...., ADD INDEX ...;

This is because secondary index modifications during load data will cause massive RocksDB compactions. If you monitor rocksdb_rows_inserted and iostat -xm, you'll often see high volume of writes to storage, even though rocksdb_rows_inserted was relatively lower. RocksDB also has a feature to slow down or suspend writes, if compactions can't keep up with writes -- rocksdb_rate_limiter_bytes_per_sec my.cnf variable and level0_slowdown_writes_trigger Column Family option are major parameters to control write speeds.

Fast index creation (ALTER TABLE .. ADD INDEX or CREATE INDEX) writes SST files in the bottommost RocksDB levels directly, bypassing compactions. So you won't see such stalls.

We'll update our wiki and describe these details once we fix known bugs. In your case, fast index creation will help.

greenlion commented 7 years ago

Have you considered implementing ALTER TABLE .. [DISABLE|ENABLE] KEYS?

greenlion commented 7 years ago

I'm trying to count the rows in the fact table (loaded with indexes in place) but RocksDB seems to be "stuck".

query is:

select count(*) from lineorder;

MySQL is using all of one core, and it isn't doing much IO:

[justin@localhost ssb-dbgen]$ iostat -kx 5
Linux 3.10.0-327.36.3.el7.x86_64 (localhost.localdomain)    10/29/2016  _x86_64_    (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.45    0.01    0.95    0.16    0.00   95.43

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.03     0.00    15.53     0.00    0.65    0.65    0.00   0.44   0.00
sdb               0.30     1.60    5.52   10.86   563.26  4703.35   642.85     1.55   94.75    0.77  142.52   0.87   1.43
sdc               0.00     0.00    0.00    0.00     0.02     0.00    14.83     0.00    3.64    3.64    0.00   3.41   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.07    0.00    1.93    0.55    0.00   85.44

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00  268.80    0.20  1552.80     0.80    11.55     0.05    0.17    0.17    0.00   0.17   4.60
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.36    0.00    1.93    0.53    0.00   85.18

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00  261.80    0.00  1464.80     0.00    11.19     0.04    0.17    0.17    0.00   0.17   4.32
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.22    0.00    1.86    0.50    0.00   85.42

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00  260.80    0.00  1464.80     0.00    11.23     0.04    0.16    0.16    0.00   0.16   4.06
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.60    0.00    2.03    0.50    0.00   84.86

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.60  245.40    3.00  1343.20    18.00    10.96     0.05    0.18    0.17    1.53   0.17   4.14
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.38    0.00    2.29    0.48    0.00   84.85

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00  235.60    1.40  1313.60   584.80    16.02     0.05    0.20    0.19    3.00   0.18   4.38
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

This is on a consumer grade SanDisk SSD. It can easily sustain higher IO rates.

The data doesn't appear to be buffered by MySQL:

[justin@localhost ssb-dbgen]$ free -m
              total        used        free      shared  buff/cache   available
Mem:          11753        1719         145          12        9888        9698
Swap:           999         459         540

[justin@localhost ssb-dbgen]$ sudo ./pmp

[sudo] password for justin: 
      2 pthread_cond_wait@@GLIBC_2.3.2,inline_mysql_cond_wait,at,block_pthread=true),do_handle_one_connection,handle_one_connection,start_thread,clone
      2 pthread_cond_wait@@GLIBC_2.3.2,ConditionWait,thread_id=thread_id@entry=0),rocksdb::BGThreadWrapper,start_thread,clone
      1 Slow_CRC32,pointer>,,out>,,Value,buf=0x7facd0b67ef0,footer=...,,rocksdb::(anonymous,Create,index_reader=index_reader@entry=0x7facf4a46c10,,rocksdb::BlockBasedTable::NewIndexIterator,rocksdb::BlockBasedTable::Get,rocksdb::TableCache::Get,rocksdb::Version::Get,rocksdb::DBImpl::GetImpl,rocksdb::DBImpl::Get,rocksdb::WriteBatchWithIndex::GetFromBatchAndDB,rocksdb::TransactionBaseImpl::Get,myrocks::Rdb_transaction_impl::get,myrocks::ha_rocksdb::get_row_by_rowid,get_row_by_rowid,keyno=<optimized,index_next_with_direction,buf=0x7facd000acf8,handle_unordered_next,buf=0x7facd000acf8,join_read_next,sub_select,do_select,at,mysql_execute_select,tables=0x7facd0005260,,handle_select,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone
      1 sigwait,signal_hand,start_thread,clone
      1 sigwaitinfo,timer_notify_thread_func,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,inline_mysql_cond_timedwait,<myrocks::rdb_drop_idx_thread>),myrocks::Rdb_thread::thread_func,start_thread,clone
      1 pthread_cond_timedwait@@GLIBC_2.3.2,inline_mysql_cond_timedwait,<myrocks::rdb_bg_thread>),myrocks::Rdb_thread::thread_func,start_thread,clone
      1 poll,handle_connections_sockets,handle_connections_sockets_all,argv=0x1f90670),__libc_start_main,_start
      1 

perf

  85.37%  my-oneconnectio  mysqld               [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>
   7.21%  my-oneconnectio  [kernel.kallsyms]    [k] copy_user_generic_string
   0.51%  my-oneconnectio  [kernel.kallsyms]    [k] __find_get_page
   0.46%  mysqld           [kernel.kallsyms]    [k] native_write_msr_safe
   0.46%  my-oneconnectio  [kernel.kallsyms]    [k] file_read_actor
   0.45%  my-oneconnectio  [kernel.kallsyms]    [k] put_page
   0.21%  mysqld           [kernel.kallsyms]    [k] native_read_msr_safe
   0.16%  my-oneconnectio  mysqld               [.] rocksdb::BlockIter::BinarySeek
   0.16%  my-oneconnectio  [kernel.kallsyms]    [k] generic_file_aio_read
   0.15%  mysqld           libpthread-2.17.so   [.] pthread_cond_timedwait@@GLIBC_2.3.2
   0.14%  my-oneconnectio  [kernel.kallsyms]    [k] __radix_tree_lookup
   0.12%  my-oneconnectio  libc-2.17.so         [.] __memcmp_sse4_1
   0.12%  mysqld           [vdso]               [.] __vdso_clock_gettime
   0.10%  my-oneconnectio  libc-2.17.so         [.] _int_malloc
   0.10%  my-oneconnectio  [vdso]               [.] __vdso_gettimeofday
   0.09%  mysqld           [kernel.kallsyms]    [k] __audit_syscall_exit
   0.08%  my-oneconnectio  mysqld               [.] rocksdb::HistogramStat::Add
   0.08%  my-oneconnectio  libc-2.17.so         [.] __memcpy_ssse3
   0.08%  mysqld           [kernel.kallsyms]    [k] native_read_tsc
   0.08%  my-oneconnectio  libc-2.17.so         [.] malloc_consolidate
   0.08%  mysqld           [kernel.kallsyms]    [k] get_futex_key_refs.isra.11
   0.07%  my-oneconnectio  libc-2.17.so         [.] malloc
   0.07%  mysqld           [kernel.kallsyms]    [k] futex_wait_queue_me
   0.07%  my-oneconnectio  libc-2.17.so         [.] _int_free
   0.06%  my-oneconnectio  mysqld               [.] rocksdb::BlockIter::Seek
   0.06%  mysqld           [kernel.kallsyms]    [k] system_call
   0.06%  my-oneconnectio  [kernel.kallsyms]    [k] mark_page_accessed
   0.06%  mysqld           [kernel.kallsyms]    [k] system_call_after_swapgs
   0.06%  my-oneconnectio  mysqld               [.] rocksdb::StatisticsImpl::recordTick
   0.06%  mysqld           [kernel.kallsyms]    [k] _raw_spin_lock
   0.06%  my-oneconnectio  mysqld               [.] rocksdb::InternalKeyComparator::Compare
   0.06%  mysqld           [kernel.kallsyms]    [k] __schedule
   0.05%  mysqld           [kernel.kallsyms]    [k] cpuacct_charge
   0.05%  mysqld           [kernel.kallsyms]    [k] hash_futex
   0.05%  mysqld           [kernel.kallsyms]    [k] futex_wait
   0.05%  my-oneconnectio  libpthread-2.17.so   [.] pthread_mutex_unlock
greenlion commented 7 years ago

zoom into ::Fast_CRC32

  85.37%  my-oneconnectio  mysqld               [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>                              ▒
   7.21%  my-oneconnectio  [kernel.kallsyms]    [k] copy_user_generic_string                                                               ▒
   0.51%  my-oneconnectio  [kernel.kallsyms]    [k] __find_get_page                                                                        ▒
   0.46%  my-oneconnectio  [kernel.kallsyms]    [k] file_read_actor                                                                        ▒
   0.45%  my-oneconnectio  [kernel.kallsyms]    [k] put_page                                                                               ▒
   0.16%  my-oneconnectio  mysqld               [.] rocksdb::BlockIter::BinarySeek                                                         ▒
   0.16%  my-oneconnectio  [kernel.kallsyms]    [k] generic_file_aio_read                                                                  ▒
   0.14%  my-oneconnectio  [kernel.kallsyms]    [k] __radix_tree_lookup                                                                    ▒
   0.12%  my-oneconnectio  libc-2.17.so         [.] __memcmp_sse4_1                                                                        ▒
   0.10%  my-oneconnectio  libc-2.17.so         [.] _int_malloc                                                                            ▒
   0.10%  my-oneconnectio  [vdso]               [.] __vdso_gettimeofday                                                                    ▒
   0.08%  my-oneconnectio  mysqld               [.] rocksdb::HistogramStat::Add                                                            ▒
   0.08%  my-oneconnectio  libc-2.17.so         [.] __memcpy_ssse3                                                                         ▒
   0.08%  my-oneconnectio  libc-2.17.so         [.] malloc_consolidate                                                                     ▒
   0.07%  my-oneconnectio  libc-2.17.so         [.] malloc                         

zoom into copy_user_generic_string

   7.21%  my-oneconnectio  [k] copy_user_generic_string                                                                                    ▒
   0.51%  my-oneconnectio  [k] __find_get_page                                                                                             ▒
   0.46%  my-oneconnectio  [k] file_read_actor                                                                                             ▒
   0.45%  my-oneconnectio  [k] put_page                                                                                                    ▒
   0.16%  my-oneconnectio  [k] generic_file_aio_read                                                                                       ▒
   0.14%  my-oneconnectio  [k] __radix_tree_lookup                                                                                         ▒
   0.06%  my-oneconnectio  [k] mark_page_accessed                                                                                          ▒
   0.04%  my-oneconnectio  [k] _cond_resched                                                                                               ▒
   0.04%  my-oneconnectio  [k] radix_tree_lookup_slot                                                                                      ▒
   0.04%  my-oneconnectio  [k] native_write_msr_safe 
greenlion commented 7 years ago
mysql> show engine rocksdb status\G
*************************** 1. row ***************************
  Type: DBSTATS
  Name: rocksdb
Status: 
** DB Stats **
Uptime(secs): 57886.6 total, 855.1 interval
Cumulative writes: 613K writes, 611M keys, 612K commit groups, 1.0 writes per commit group, ingest: 18.76 GB, 0.33 MB/s
Cumulative WAL: 613K writes, 10 syncs, 55789.18 writes per sync, written: 18.76 GB, 0.33 MB/s
Cumulative stall: 03:13:20.714 H:M:S, 20.0 percent
Interval writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 MB, 0.00 MB/s
Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
** Level 0 read latency histogram (micros):
Count: 4899572 Average: 1.9046  StdDev: 0.71
Min: 0  Median: 1.4151  Max: 89
Percentiles: P50: 1.42 P75: 1.75 P99: 3.38 P99.9: 10.70 P99.99: 14.96
------------------------------------------------------
[       0,       1 )   910537  18.584%  18.584% ####
[       1,       2 )  3708541  75.691%  94.275% ###############
[       2,       3 )   221917   4.529%  98.804% #
[       3,       4 )    25533   0.521%  99.326% 
[       4,       5 )    16139   0.329%  99.655% 
[       5,       6 )     5433   0.111%  99.766% 
[       6,       7 )     1816   0.037%  99.803% 
[       7,       8 )     1466   0.030%  99.833% 
[       8,       9 )     1034   0.021%  99.854% 
[       9,      10 )      976   0.020%  99.874% 
[      10,      12 )     3637   0.074%  99.948% 
[      12,      14 )     1845   0.038%  99.986% 
[      14,      16 )      432   0.009%  99.995% 
[      16,      18 )      145   0.003%  99.998% 
[      18,      20 )       33   0.001%  99.998% 
[      20,      25 )       37   0.001%  99.999% 
[      25,      30 )       17   0.000%  99.999% 
[      30,      35 )        6   0.000%  99.999% 
[      35,      40 )        9   0.000% 100.000% 
[      40,      45 )        3   0.000% 100.000% 
[      45,      50 )        8   0.000% 100.000% 
[      50,      60 )        2   0.000% 100.000% 
[      60,      70 )        4   0.000% 100.000% 
[      80,      90 )        2   0.000% 100.000% 

** Level 1 read latency histogram (micros):
Count: 44621056 Average: 1.8442  StdDev: 3.07
Min: 0  Median: 1.3628  Max: 7111
Percentiles: P50: 1.36 P75: 1.69 P99: 2.77 P99.9: 11.31 P99.99: 131.42
------------------------------------------------------
[       0,       1 ) 10087190  22.606%  22.606% #####
[       1,       2 ) 33692106  75.507%  98.114% ###############
[       2,       3 )   516302   1.157%  99.271% 
[       3,       4 )   111313   0.249%  99.520% 
[       4,       5 )    72270   0.162%  99.682% 
[       5,       6 )    23452   0.053%  99.735% 
[       6,       7 )    12091   0.027%  99.762% 
[       7,       8 )    14101   0.032%  99.793% 
[       8,       9 )    11069   0.025%  99.818% 
[       9,      10 )    12148   0.027%  99.845% 
[      10,      12 )    37118   0.083%  99.929% 
[      12,      14 )    15617   0.035%  99.964% 
[      14,      16 )     3009   0.007%  99.970% 
[      16,      18 )     1449   0.003%  99.974% 
[      18,      20 )      333   0.001%  99.974% 
[      20,      25 )      443   0.001%  99.975% 
[      25,      30 )     1047   0.002%  99.978% 
[      30,      35 )      950   0.002%  99.980% 
[      35,      40 )      360   0.001%  99.981% 
[      40,      45 )      331   0.001%  99.981% 
[      45,      50 )      331   0.001%  99.982% 
[      50,      60 )     1893   0.004%  99.986% 
[      60,      70 )      401   0.001%  99.987% 
[      70,      80 )      212   0.000%  99.988% 
[      80,      90 )      183   0.000%  99.988% 
[      90,     100 )      190   0.000%  99.988% 
[     100,     120 )      360   0.001%  99.989% 
[     120,     140 )      569   0.001%  99.991% 
[     140,     160 )     2316   0.005%  99.996% 
[     160,     180 )     1316   0.003%  99.999% 
[     180,     200 )      375   0.001% 100.000% 
[     200,     250 )      137   0.000% 100.000% 
[     250,     300 )       11   0.000% 100.000% 
[     300,     350 )        4   0.000% 100.000% 
[     350,     400 )        3   0.000% 100.000% 
[     400,     450 )        2   0.000% 100.000% 
[     450,     500 )        1   0.000% 100.000% 
[     500,     600 )        2   0.000% 100.000% 
[     600,     700 )        3   0.000% 100.000% 
[     700,     800 )        2   0.000% 100.000% 
[     800,     900 )        4   0.000% 100.000% 
[     900,    1000 )        3   0.000% 100.000% 
[    1000,    1200 )        5   0.000% 100.000% 
[    1200,    1400 )        6   0.000% 100.000% 
[    1400,    1600 )        5   0.000% 100.000% 
[    1600,    1800 )        2   0.000% 100.000% 
[    1800,    2000 )        6   0.000% 100.000% 
[    2000,    2500 )        3   0.000% 100.000% 
[    2500,    3000 )        4   0.000% 100.000% 
[    3000,    3500 )        5   0.000% 100.000% 
[    3500,    4000 )        1   0.000% 100.000% 
[    5000,    6000 )        1   0.000% 100.000% 
[    7000,    8000 )        1   0.000% 100.000% 

** Level 2 read latency histogram (micros):
Count: 66811738 Average: 3.4027  StdDev: 58.92
Min: 0  Median: 1.3528  Max: 104140
Percentiles: P50: 1.35 P75: 1.71 P99: 57.07 P99.9: 135.85 P99.99: 235.76
------------------------------------------------------
[       0,       1 ) 16717277  25.021%  25.021% #####
[       1,       2 ) 47300070  70.796%  95.818% ##############
[       2,       3 )   788252   1.180%  96.997% 
[       3,       4 )   164175   0.246%  97.243% 
[       4,       5 )   110904   0.166%  97.409% 
[       5,       6 )    35953   0.054%  97.463% 
[       6,       7 )    19794   0.030%  97.492% 
[       7,       8 )    20841   0.031%  97.524% 
[       8,       9 )    16181   0.024%  97.548% 
[       9,      10 )    18388   0.028%  97.575% 
[      10,      12 )    54188   0.081%  97.657% 
[      12,      14 )    21559   0.032%  97.689% 
[      14,      16 )     4575   0.007%  97.696% 
[      16,      18 )     2116   0.003%  97.699% 
[      18,      20 )      496   0.001%  97.700% 
[      20,      25 )      740   0.001%  97.701% 
[      25,      30 )    28022   0.042%  97.743% 
[      30,      35 )    23735   0.036%  97.778% 
[      35,      40 )     4377   0.007%  97.785% 
[      40,      45 )     1955   0.003%  97.788% 
[      45,      50 )   146264   0.219%  98.007% 
[      50,      60 )   939398   1.406%  99.413% 
[      60,      70 )   132744   0.199%  99.611% 
[      70,      80 )   105533   0.158%  99.769% 
[      80,      90 )    36878   0.055%  99.824% 
[      90,     100 )    19838   0.030%  99.854% 
[     100,     120 )    24166   0.036%  99.890% 
[     120,     140 )     8210   0.012%  99.903% 
[     140,     160 )    10543   0.016%  99.918% 
[     160,     180 )    15630   0.023%  99.942% 
[     180,     200 )    21667   0.032%  99.974% 
[     200,     250 )    14802   0.022%  99.996% 
[     250,     300 )      484   0.001%  99.997% 
[     300,     350 )       81   0.000%  99.997% 
[     350,     400 )       50   0.000%  99.997% 
[     400,     450 )       58   0.000%  99.997% 
[     450,     500 )       30   0.000%  99.997% 
[     500,     600 )      100   0.000%  99.998% 
[     600,     700 )       58   0.000%  99.998% 
[     700,     800 )       58   0.000%  99.998% 
[     800,     900 )       70   0.000%  99.998% 
[     900,    1000 )       35   0.000%  99.998% 
[    1000,    1200 )       92   0.000%  99.998% 
[    1200,    1400 )       67   0.000%  99.998% 
[    1400,    1600 )       48   0.000%  99.998% 
[    1600,    1800 )       45   0.000%  99.998% 
[    1800,    2000 )       61   0.000%  99.998% 
[    2000,    2500 )      168   0.000%  99.999% 
[    2500,    3000 )      121   0.000%  99.999% 
[    3000,    3500 )      101   0.000%  99.999% 
[    3500,    4000 )       60   0.000%  99.999% 
[    4000,    4500 )       37   0.000%  99.999% 
[    4500,    5000 )       60   0.000%  99.999% 
[    5000,    6000 )       50   0.000%  99.999% 
[    6000,    7000 )       56   0.000%  99.999% 
[    7000,    8000 )       38   0.000%  99.999% 
[    8000,    9000 )       48   0.000%  99.999% 
[    9000,   10000 )       38   0.000%  99.999% 
[   10000,   12000 )       55   0.000% 100.000% 
[   12000,   14000 )       30   0.000% 100.000% 
[   14000,   16000 )       26   0.000% 100.000% 
[   16000,   18000 )       26   0.000% 100.000% 
[   18000,   20000 )       36   0.000% 100.000% 
[   20000,   25000 )       30   0.000% 100.000% 
[   25000,   30000 )       27   0.000% 100.000% 
[   30000,   35000 )       29   0.000% 100.000% 
[   35000,   40000 )       19   0.000% 100.000% 
[   40000,   45000 )        9   0.000% 100.000% 
[   45000,   50000 )       13   0.000% 100.000% 
[   50000,   60000 )       20   0.000% 100.000% 
[   60000,   70000 )        8   0.000% 100.000% 
[   70000,   80000 )        1   0.000% 100.000% 
[   80000,   90000 )        9   0.000% 100.000% 
[   90000,  100000 )       12   0.000% 100.000% 
[  100000,  120000 )        3   0.000% 100.000% 

** Level 3 read latency histogram (micros):
Count: 10252766 Average: 19.5681  StdDev: 196.14
Min: 0  Median: 1.6620  Max: 162157
Percentiles: P50: 1.66 P75: 46.39 P99: 115.02 P99.9: 224.72 P99.99: 1698.46
------------------------------------------------------
[       0,       1 )  1410496  13.757%  13.757% ###
[       1,       2 )  5613165  54.748%  68.505% ###########
[       2,       3 )   263515   2.570%  71.075% #
[       3,       4 )    27262   0.266%  71.341% 
[       4,       5 )    18586   0.181%  71.522% 
[       5,       6 )     7120   0.069%  71.592% 
[       6,       7 )     4294   0.042%  71.634% 
[       7,       8 )     3568   0.035%  71.669% 
[       8,       9 )     2188   0.021%  71.690% 
[       9,      10 )     2354   0.023%  71.713% 
[      10,      12 )     7704   0.075%  71.788% 
[      12,      14 )     3751   0.037%  71.825% 
[      14,      16 )     1191   0.012%  71.836% 
[      16,      18 )      555   0.005%  71.842% 
[      18,      20 )      262   0.003%  71.844% 
[      20,      25 )      575   0.006%  71.850% 
[      25,      30 )    94175   0.919%  72.768% 
[      30,      35 )    66727   0.651%  73.419% 
[      35,      40 )    10417   0.102%  73.521% 
[      40,      45 )   126835   1.237%  74.758% 
[      45,      50 )    89476   0.873%  75.630% 
[      50,      60 )  1756932  17.136%  92.767% ###
[      60,      70 )   269158   2.625%  95.392% #
[      70,      80 )   182039   1.776%  97.167% 
[      80,      90 )    95225   0.929%  98.096% 
[      90,     100 )    49150   0.479%  98.576% 
[     100,     120 )    57954   0.565%  99.141% 
[     120,     140 )    20915   0.204%  99.345% 
[     140,     160 )    10467   0.102%  99.447% 
[     160,     180 )    16682   0.163%  99.610% 
[     180,     200 )    21838   0.213%  99.823% 
[     200,     250 )    16055   0.157%  99.979% 
[     250,     300 )      601   0.006%  99.985% 
[     300,     350 )       57   0.001%  99.986% 
[     350,     400 )       43   0.000%  99.986% 
[     400,     450 )       17   0.000%  99.986% 
[     450,     500 )       17   0.000%  99.986% 
[     500,     600 )       94   0.001%  99.987% 
[     600,     700 )       59   0.001%  99.988% 
[     700,     800 )       29   0.000%  99.988% 
[     800,     900 )       22   0.000%  99.988% 
[     900,    1000 )       34   0.000%  99.989% 
[    1000,    1200 )       42   0.000%  99.989% 
[    1200,    1400 )       43   0.000%  99.989% 
[    1400,    1600 )       34   0.000%  99.990% 
[    1600,    1800 )       36   0.000%  99.990% 
[    1800,    2000 )       39   0.000%  99.991% 
[    2000,    2500 )      212   0.002%  99.993% 
[    2500,    3000 )      197   0.002%  99.995% 
[    3000,    3500 )       76   0.001%  99.995% 
[    3500,    4000 )       58   0.001%  99.996% 
[    4000,    4500 )       30   0.000%  99.996% 
[    4500,    5000 )       34   0.000%  99.996% 
[    5000,    6000 )       45   0.000%  99.997% 
[    6000,    7000 )       31   0.000%  99.997% 
[    7000,    8000 )       33   0.000%  99.998% 
[    8000,    9000 )       20   0.000%  99.998% 
[    9000,   10000 )       14   0.000%  99.998% 
[   10000,   12000 )       32   0.000%  99.998% 
[   12000,   14000 )       28   0.000%  99.998% 
[   14000,   16000 )       18   0.000%  99.999% 
[   16000,   18000 )        9   0.000%  99.999% 
[   18000,   20000 )       14   0.000%  99.999% 
[   20000,   25000 )       30   0.000%  99.999% 
[   25000,   30000 )       12   0.000%  99.999% 
[   30000,   35000 )        8   0.000%  99.999% 
[   35000,   40000 )       13   0.000%  99.999% 
[   40000,   45000 )        7   0.000% 100.000% 
[   50000,   60000 )       10   0.000% 100.000% 
[   60000,   70000 )        5   0.000% 100.000% 
[   70000,   80000 )       16   0.000% 100.000% 
[   80000,   90000 )       13   0.000% 100.000% 
[   90000,  100000 )        1   0.000% 100.000% 
[  120000,  140000 )        1   0.000% 100.000% 
[  160000,  180000 )        1   0.000% 100.000% 

** Level 6 read latency histogram (micros):
Count: 29 Average: 58.7586  StdDev: 116.89
Min: 0  Median: 2.2500  Max: 445
Percentiles: P50: 2.25 P75: 87.50 P99: 442.75 P99.9: 445.00 P99.99: 445.00
------------------------------------------------------
[       0,       1 )       12  41.379%  41.379% ########
[       1,       2 )        2   6.897%  48.276% #
[       2,       3 )        2   6.897%  55.172% #
[       3,       4 )        2   6.897%  62.069% #
[       4,       5 )        1   3.448%  65.517% #
[       5,       6 )        2   6.897%  72.414% #
[      80,      90 )        1   3.448%  75.862% #
[      90,     100 )        1   3.448%  79.310% #
[     100,     120 )        1   3.448%  82.759% #
[     120,     140 )        2   6.897%  89.655% #
[     200,     250 )        1   3.448%  93.103% #
[     400,     450 )        2   6.897% 100.000% #

*************************** 2. row ***************************
  Type: CF_COMPACTION
  Name: __system__
Status: 
** Compaction Stats [__system__] **
Level    Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      2/0       0.03   0.5      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      4.9         0        73    0.003       0      0
  L1      1/0       0.02   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.3     24.2      6.3         0        18    0.004     18K    11K
 Sum      3/0       0.06   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.3      5.0      5.2         0        91    0.003     18K    11K
 Int      0/0       0.00   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000       0      0
Uptime(secs): 57886.6 total, 57886.6 interval
Flush(GB): cumulative 0.001, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.3 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count

*************************** 3. row ***************************
  Type: CF_COMPACTION
  Name: cf_link_pk
Status: 
** Compaction Stats [cf_link_pk] **
Level    Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sum      0/0       0.00   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000       0      0
 Int      0/0       0.00   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000       0      0
Uptime(secs): 57886.6 total, 57886.6 interval
Flush(GB): cumulative 0.000, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count

*************************** 4. row ***************************
  Type: CF_COMPACTION
  Name: default
Status: 
** Compaction Stats [default] **
Level    Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0       0.94   0.2      0.0     0.0      0.0      18.3     18.3       0.0   0.0      0.0     56.9       329      8360    0.039       0      0
  L1      7/0     200.02   0.8    169.4    18.3    151.2     168.6     17.5       0.0   9.2     26.6     26.5      6520       483   13.500   4807M   286K
  L2     42/0    2540.07   1.0    248.4    16.8    231.6     247.9     16.3       0.4  14.7     43.0     42.9      5919       262   22.591   6449M      0
  L3    225/0   14167.48   0.6     39.9    12.1     27.9      39.6     11.7       2.1   3.3     28.9     28.6      1415       140   10.108   1765M  3713K
  L6      4/0     151.28   0.0      0.0     0.0      0.0       0.2      0.2       0.0   0.0      0.0   6795.9         0         5    0.007       0      0
 Sum    279/0   17059.78   0.0    457.8    47.2    410.6     474.6     63.9       2.6  25.6     33.1     34.3     14183      9250    1.533     13G  4000K
 Int      0/0       0.00   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000       0      0
Uptime(secs): 57886.6 total, 57886.6 interval
Flush(GB): cumulative 18.273, interval 0.000
AddFile(GB): cumulative 0.234, interval 0.000
AddFile(Total Files): cumulative 5, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 2642556, interval 0
Cumulative compaction: 474.58 GB write, 8.40 MB/s write, 457.83 GB read, 8.10 MB/s read, 14183.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 1181 level0_slowdown, 385 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 3 memtable_compaction, 0 memtable_slowdown, interval 0 total count

*************************** 5. row ***************************
  Type: CF_COMPACTION
  Name: rev:cf_link_id1_type
Status: 
** Compaction Stats [rev:cf_link_id1_type] **
Level    Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sum      0/0       0.00   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000       0      0
 Int      0/0       0.00   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000       0      0
Uptime(secs): 57886.6 total, 57886.6 interval
Flush(GB): cumulative 0.000, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count

*************************** 6. row ***************************
  Type: Memory_Stats
  Name: rocksdb
Status: 
MemTable Total: 7694016
MemTable Unflushed: 96952
Table Readers Total: 0
Cache Total: 9931621
Default Cache Capacity: 0
6 rows in set (0.01 sec)
greenlion commented 7 years ago

It is reading about 2K rows/second from the table. This query would take ~55500 seconds at the current rate of reads.

greenlion commented 7 years ago

The .rocksdb folder is 17GB. Raw data was ~12GB of data.

greenlion commented 7 years ago

Loading data without indexes was significantly faster, and the count completes quickly after loading. mysql> select count() from lineorder; +-----------+ | count() | +-----------+ | 119994608 | +-----------+ 1 row in set (2 min 3.76 sec)

Size on disk: 9.9GB

I will add the indexes and continue testing.

mdcallag commented 7 years ago

Can you grep $datadir/.rocksdb/LOG for "Fast CRC32 supported"

First guess, fast crc32 was not used, fast means HW accelerated based on seeing this in PMP -> "Slow_CRC32" and this in perf -> 85.37% my-oneconnectio mysqld [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>

Note there is a compile-time and run-time choice that determines whether fast crc32 is used. At compile time, support is enabled when SSE4_2 and LP64 are defined: https://github.com/facebook/rocksdb/blob/master/util/crc32c.cc#L318

Runtime check that determines whether it can be used is at: https://github.com/facebook/rocksdb/blob/master/util/crc32c.cc#L391

greenlion commented 7 years ago

There is no mention of CRC (I did case insensitive search) in the LOG file.

Here is processor info:

processor : 7 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD FX(tm)-8350 Eight-Core Processor
stepping : 0 microcode : 0x600081f cpu MHz : 4013.463 cache size : 2048 KB physical id : 0 siblings : 8 core id : 7 cpu cores : 4 apicid : 23 initial apicid : 7 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1 bogomips : 8026.92 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

mdcallag commented 7 years ago

Your CPU supports sse4_2. On my Ubuntu 16.04 VM that runs on my Mac laptop with core i7 pure RocksDB builds (https://github.com/facebook/rocksdb) get fast crc32.

The difference below occurs because the pure-RocksDB build on my laptop is done with -the -march=native flag. The embedded RocksDB build used by FB MySQL does not use the -march=native flag. I will file a separate issue for the build problem. I have encountered this before.

This is output from the preprocessor for util/crc32c.cc

static inline void Fast_CRC32(uint64_t* l, uint8_t const **p) {
  *l = _mm_crc32_u64(*l, LE_LOAD64(*p));
  *p += 8;
# 332 "util/crc32c.cc"
}

static bool isSSE42() {
  uint32_t c_;
  uint32_t d_;
  __asm__("cpuid" : "=c"(c_), "=d"(d_) : "a"(1) : "ebx");
  return c_ & (1U << 20);
}

Same VM, but from the embedded RocksDB build done as part of FB MySQL, fast crc32 is not enabled at compile-time:

static inline void Fast_CRC32(uint64_t* l, uint8_t const **p) {
# 330 "/home/mdcallag/git/fbmysql/rocksdb/util/crc32c.cc"
  Slow_CRC32(l, p);
}

static bool isSSE42() {
  uint32_t c_;
  uint32_t d_;
  __asm__("cpuid" : "=c"(c_), "=d"(d_) : "a"(1) : "ebx");
  return c_ & (1U << 20);
}

Compiler command lines from my Ubuntu VM. First is RocksDB embedded by FB MySQL:

/usr/bin/c++   -DHAVE_CONFIG_H -DHAVE_SCHED_GETCPU=1 -DOS_LINUX -DROCKSDB_LIB_IO_POSIX -DROCKSDB_PLATFORM_POSIX -DZLIB -I/home/mdcallag/git/fbmysql/include -I/home/mdcallag/git/fbmysql/rocksdb -I/home/mdcallag/git/fbmysql/rocksdb/include -I/home/mdcallag/git/fbmysql/rocksdb/third-party/gtest-1.7.0/fused-src -I/home/mdcallag/git/fbmysql/sql -I/home/mdcallag/git/fbmysql/regex -I/home/mdcallag/git/fbmysql/zlib  -Wall -Wextra -Wformat-security -Woverloaded-virtual -Wno-unused-parameter -std=c++11 -O3 -g -fabi-version=2 -fno-omit-frame-pointer -fno-strict-aliasing -DDBUG_OFF  -c /home/mdcallag/git/fbmysql/rocksdb/util/crc32c.cc

Next is pure-RocksDB

g++  -g -W -Wextra -Wall -Wsign-compare -Wshadow -Wno-unused-parameter -I. -I./include -std=c++11  -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX  -DOS_LINUX -fno-builtin-memcmp -DROCKSDB_FALLOCATE_PRESENT -DSNAPPY -DGFLAGS=gflags -DZLIB -DBZIP2 -DLZ4 -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -march=native  "-DROCKSDB_JEMALLOC"  -isystem ./third-party/gtest-1.7.0/fused-src -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -DNDEBUG -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -c util/crc32c.cc
mdcallag commented 7 years ago

Can you determine whether rocksdb/util/crc32.cc is compiled with -march=native?

https://github.com/facebook/rocksdb/issues/1416 prevents whether CRC32 is enabled from getting logged to $datadir/.rocksdb/LOG. The workaround is to change the log level. Default is error -> "rocksdb_info_log_level error_level". Add this to my.cnf: rocksdb_info_log_level=info_level

And then grep again. I see this on my VM:

016/10/29-08:48:36.270223 7f30a69f1940 Compression algorithms supported:
2016/10/29-08:48:36.270226 7f30a69f1940         Snappy supported: 0
2016/10/29-08:48:36.270228 7f30a69f1940         Zlib supported: 1
2016/10/29-08:48:36.270231 7f30a69f1940         Bzip supported: 0
2016/10/29-08:48:36.270233 7f30a69f1940         LZ4 supported: 0
2016/10/29-08:48:36.270236 7f30a69f1940 Fast CRC32 supported: 0
greenlion commented 7 years ago

I will do so ASAP. I have some other things on my plate ATM.

mdcallag commented 7 years ago

Workaround for getting fast crc32 support is described in https://github.com/facebook/mysql-5.6/issues/370#issuecomment-257102068

greenlion commented 7 years ago

After adding indexes, the query now takes forever. I'll try to get fast crc32 support per @mdcallag suggestion and see what happens. But I think it is important to note that table scans are impossible without fast CRC32 support. InnoDB is slower with native checksums but not so slow that the engine becomes unusable.

just fyi, after adding the indexes the CRC is the top consumer of time.

  75.93%  my-oneconnectio  mysqld               [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>
   5.94%  my-oneconnectio  [kernel.kallsyms]    [k] copy_user_generic_string
   2.36%  mysqld           [kernel.kallsyms]    [k] native_write_msr_safe
   1.18%  mysqld           [kernel.kallsyms]    [k] native_read_msr_safe
   0.75%  mysqld           libpthread-2.17.so   [.] pthread_cond_timedwait@@GLIBC_2.3.2
   0.57%  mysqld           [vdso]               [.] __vdso_clock_gettime
mdcallag commented 7 years ago

Thanks. I will try scans with & without HW crc32 on one of my servers to understand the impact. I might not have much time for debugging the next 2 weeks because of work.

I wonder if we have people using RocksDB on non x86 platforms and whether we will get faster HW crc32 there, or a faster software crc32 for them.

greenlion commented 7 years ago

I took a look in crc32c.cc - shouldn't the above stack show Slow_CRC32 if the slower CRC32 is being used? As a test I added -DFORCE_CRC32=1 to CMAKE_CXX_FLAGS and I put an #ifdef in to make sure the _64 function was being called. It compiled fine, but I still see all the time in the Fast_CRC32 function.

greenlion commented 7 years ago

Just to be extra sure I modified rocksdb::crc32c.c::Fast_CRC32 to be simply:

static inline void Fast_CRC32(uint64_t* l, uint8_t const **p) {
  *l = _mm_crc32_u64(*l, LE_LOAD64(*p));
  *p += 8;
}

Thus FastCRC is being used for sure. For some reason, after adding indexes to this data, the server gets bogged down by excessive crc32 operations when reading from the index.

index scan based read:

mysql> explain select count(*) From lineorder;
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
| id | select_type | table     | type  | possible_keys | key             | key_len | ref  | rows      | Extra       |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
|  1 | SIMPLE      | lineorder | index | NULL          | LO_OrderDateKey | 4       | NULL | 119966000 | Using index |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
1 row in set (0.00 sec)

If I force a full table scan instead of a full index scan, the query returns results more quickly:

mysql> explain select count(*) From lineorder where LO_OrderDateKey + LO_SuppKey is not null;
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows      | Extra       |
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
|  1 | SIMPLE      | lineorder | ALL  | NULL          | NULL | NULL    | NULL | 119966000 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
1 row in set (0.00 sec)

mysql> select count(*) From lineorder where LO_OrderDateKey + LO_SuppKey is not null;
+-----------+
| count(*)  |
+-----------+
| 119994608 |
+-----------+
1 row in set (3 min 43.48 sec)

perf output for full table scan

   3.70%  mysqld   mysqld               [.] rocksdb::MergingIterator::Next                                                                 ▒
   3.41%  mysqld   [kernel.kallsyms]    [k] native_write_msr_safe                                                                          ▒
   3.33%  mysqld   libc-2.17.so         [.] __memcpy_ssse3                                                                                 ▒
   3.31%  mysqld   [vdso]               [.] __vdso_getcpu                                                                                  ▒
   3.09%  mysqld   mysqld               [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>                                      ▒
   2.79%  mysqld   mysqld               [.] myrocks::ha_rocksdb::convert_record_from_storage_format                                        ▒
   2.29%  mysqld   mysqld               [.] myrocks::ha_rocksdb::rnd_next_with_direction                                                   ▒
   2.12%  mysqld   mysqld               [.] rocksdb::DBIter::Next                                                                          ▒
   1.95%  mysqld   mysqld               [.] rocksdb::BlockIter::Next         
mdcallag commented 7 years ago

Excellent questions but I have slide decks for 2 talks that need to be made better.

mdcallag commented 7 years ago

When I return I will try your tests for InnoDB and RocksDB with and without fast crc32. The only thing I don't have is an AMD CPU.

greenlion commented 7 years ago

Are you saying my code modification is using slow CRC? I changed the code to only call the SSE function, and the perf output would have shown slow_crc32 if the #ifdef wasn't running the hardware function.

greenlion commented 7 years ago

I'll hook up my 12 core i970 and run the same test there. I'll let you know what I find.

greenlion commented 7 years ago

As configured on the i970.
Per your workaround, I added -march=native -msse -msse4.2 to CMAKE_CXX_FLAGS -- Running cmake version 2.8.11 -- MySQL 5.6.27 -- Packaging as: mysql-5.6.27-Linux-x86_64 -- HAVE_VISIBILITY_HIDDEN -- OPENSSL_INCLUDE_DIR = /usr/include -- OPENSSL_LIBRARY = /usr/lib64/libssl.so -- CRYPTO_LIBRARY = /usr/lib64/libcrypto.so -- OPENSSL_MAJOR_VERSION = 1 -- SSL_LIBRARIES = /usr/lib64/libssl.so;/usr/lib64/libcrypto.so;dl -- Using cmake version 2.8.11 -- Not building NDB -- Library webscalesqlclient depends on OSLIBS -lpthread;m;rt;/usr/lib64/libssl.so;/usr/lib64/libcrypto.so;dl -- Library mysqlserver depends on OSLIBS -lpthread;m;rt;/usr/lib64/libssl.so;/usr/lib64/libcrypto.so;dl;crypt;aio;numa;snappy;lz4;bz2;-lrt -- CMAKE_BUILD_TYPE: RelWithDebInfo -- COMPILE_DEFINITIONS: HAVE_CONFIG_H -- CMAKE_C_FLAGS: -Wall -Wextra -Wformat-security -Wwrite-strings -std=c99 -- CMAKE_CXX_FLAGS: -march=native -msse -msse4.2 -Wall -Wextra -Wformat-security -Woverloaded-virtual -Wno-unused-parameter -std=c++11 -- CMAKE_C_FLAGS_RELWITHDEBINFO: -O3 -g -fabi-version=2 -fno-omit-frame-pointer -fno-strict-aliasing -DDBUG_OFF -- CMAKE_CXX_FLAGS_RELWITHDEBINFO: -O3 -g -fabi-version=2 -fno-omit-frame-pointer -fno-strict-aliasing -DDBUG_OFF -- Configuring done -- Generating done -- Build files have been written to: /home/justin/src/myrocks/build

processor : 11 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Core(TM) i7 CPU 970 @ 3.20GHz stepping : 2 microcode : 0x13 cpu MHz : 1600.000 cache size : 12288 KB physical id : 0 siblings : 12 core id : 10 cpu cores : 6 apicid : 21 initial apicid : 21 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dtherm tpr_shadow vnmi flexpriority ept vpid bogomips : 6408.21 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:

Here is the perf from the index scan on the i970:

38.35%  mysqld   mysqld               [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>
  28.94%  mysqld   [kernel.kallsyms]    [k] copy_user_generic_string
   2.44%  mysqld   [kernel.kallsyms]    [k] _raw_spin_unlock_irqrestore
   1.47%  mysqld   [kernel.kallsyms]    [k] put_page
   1.19%  mysqld   [kernel.kallsyms]    [k] native_read_tsc
   1.08%  mysqld   [kernel.kallsyms]    [k] delay_tsc
   0.86%  mysqld   [kernel.kallsyms]    [k] file_read_actor
   0.82%  mysqld   [kernel.kallsyms]    [k] __find_get_page
   0.66%  mysqld   [kernel.kallsyms]    [k] generic_file_aio_read
   0.65%  mysqld   [kernel.kallsyms]    [k] iowrite8
   0.62%  mysqld   [kernel.kallsyms]    [k] __radix_tree_lookup
   0.49%  mysqld   [kernel.kallsyms]    [k] __audit_syscall_exit
   0.48%  mysqld   libc-2.17.so         [.] __memcmp_sse4_1
   0.47%  mysqld   mysqld               [.] rocksdb::BlockIter::BinarySeek
greenlion commented 7 years ago

Here is the diff to force fast CRC32. Fast CRC32 is in use on both of my machines, and the index scan is very slow on both AMD64 and Intel Core i7.

diff --git a/util/crc32c.cc b/util/crc32c.cc
index ce57454..d933e51 100644
--- a/util/crc32c.cc
+++ b/util/crc32c.cc
@@ -316,19 +316,8 @@ static inline void Slow_CRC32(uint64_t* l, uint8_t const **p) {
 }

 static inline void Fast_CRC32(uint64_t* l, uint8_t const **p) {
-#ifdef __SSE4_2__
-#ifdef __LP64__
   *l = _mm_crc32_u64(*l, LE_LOAD64(*p));
   *p += 8;
-#else
-  *l = _mm_crc32_u32(static_cast<unsigned int>(*l), LE_LOAD32(*p));
-  *p += 4;
-  *l = _mm_crc32_u32(static_cast<unsigned int>(*l), LE_LOAD32(*p));
-  *p += 4;
-#endif
-#else
-  Slow_CRC32(l, p);
-#endif
 }

 template<void (*CRC32)(uint64_t*, uint8_t const**)>
@@ -389,10 +378,12 @@ static bool isSSE42() {
 typedef uint32_t (*Function)(uint32_t, const char*, size_t);

 static inline Function Choose_Extend() {
-  return isSSE42() ? ExtendImpl<Fast_CRC32> : ExtendImpl<Slow_CRC32>;
+  // return isSSE42() ? ExtendImpl<Fast_CRC32> : ExtendImpl<Slow_CRC32>;
+  return ExtendImpl<Fast_CRC32>;
 }

 bool IsFastCrc32Supported() {
+  return true;
 #ifdef __SSE4_2__
   return isSSE42();
 #else
greenlion commented 7 years ago

Even a basic aggregate query takes a very long time:

mysql> explain select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
| id | select_type | table     | type | possible_keys   | key             | key_len | ref   | rows | Extra       |
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
|  1 | SIMPLE      | lineorder | ref  | LO_OrderDateKey | LO_OrderDateKey | 4       | const | 6613 | Using index |
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

mysql> select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----------+
| count(*) |
+----------+
|    49519 |
+----------+
1 row in set (11.43 sec)
greenlion commented 7 years ago

Load time for uncompressed InnoDB table: mysql> load data infile '/home/justin/src/ssb-dbgen/lineorder.tbl' into table lineorder fields terminated by '|' lines terminated by '|\n'; Query OK, 119994608 rows affected (27 min 46.46 sec) Records: 119994608 Deleted: 0 Skipped: 0 Warnings: 0

mysql> alter table lineorder add key(LO_OrderDateKey), add key(LO_CustKey), add key(LO_PartKey), add key(LO_SuppKey); Query OK, 0 rows affected (21 min 29.79 sec) Records: 0 Duplicates: 0 Warnings: 0

InnoDB uncompressed size on disk: 20G

full index scan

mysql> explain select count(*) from lineorder;
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
| id | select_type | table     | type  | possible_keys | key             | key_len | ref  | rows      | Extra       |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
|  1 | SIMPLE      | lineorder | index | NULL          | LO_OrderDateKey | 4       | NULL | 116404108 | Using index |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
1 row in set (0.01 sec)

mysql> select count(*) from lineorder;
+-----------+
| count(*)  |
+-----------+
| 119994609 |
+-----------+
1 row in set (36.49 sec)

Full table scan

mysql> explain select count(*) from lineorder where LO_OrderDateKey + LO_SuppKey is not null;
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
| id | select_type | table     | type | possible_keys | key  | key_len | ref  | rows      | Extra       |
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
|  1 | SIMPLE      | lineorder | ALL  | NULL          | NULL | NULL    | NULL | 116404108 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
1 row in set (0.00 sec)

mysql> select count(*) from lineorder where LO_OrderDateKey + LO_SuppKey is not null;
+-----------+
| count(*)  |
+-----------+
| 119994609 |
+-----------+
1 row in set (1 min 10.86 sec)

aggregate query

mysql> select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----------+
| count(*) |
+----------+
|    49519 |
+----------+
1 row in set (0.05 sec)
greenlion commented 7 years ago

InnoDB compressed table, KEY_BLOCK_SIZE=8

mysql> load data infile '/home/justin/src/ssb-dbgen/lineorder.tbl' into table lineorder fields terminated by '|' lines terminated by '|\n';Query OK, 119994608 rows affected (1 hour 26 min 51.56 sec)
Records: 119994608  Deleted: 0  Skipped: 0  Warnings: 0

mysql> alter table lineorder add key(LO_OrderDateKey), add key(LO_CustKey), add key(LO_PartKey), add key(LO_SuppKey);Query OK, 0 rows affected (36 min 53.69 sec)
Records: 0  Duplicates: 0  Warnings: 0

On disk size: 9.9GB

index scan on compress data

mysql> explain select count(*) from lineorder ;
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
| id | select_type | table     | type  | possible_keys | key             | key_len | ref  | rows      | Extra       |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
|  1 | SIMPLE      | lineorder | index | NULL          | LO_OrderDateKey | 4       | NULL | 116426330 | Using index |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
1 row in set (0.00 sec)

mysql>  select count(*) from lineorder ;
+-----------+
| count(*)  |
+-----------+
| 119994608 |
+-----------+
1 row in set (54.78 sec)

FTS on compressed data

mysql> select count(*) from lineorder where LO_OrderDateKey + LO_SuppKey is not null;
+-----------+
| count(*)  |
+-----------+
| 119994608 |
+-----------+
1 row in set (3 min 22.84 sec)

aggregate over compressed data

mysql> select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----------+
| count(*) |
+----------+
|    49519 |
+----------+
1 row in set (0.04 sec)
mdcallag commented 7 years ago

Can you share SHOW ENGINE ROCKSDB STATUS output from the time when the point query is slow?

On Sat, Oct 29, 2016 at 5:45 PM, Justin Swanhart notifications@github.com wrote:

Load time for uncompressed InnoDB table: mysql> load data infile '/home/justin/src/ssb-dbgen/lineorder.tbl' into table lineorder fields terminated by '|' lines terminated by '|\n'; Query OK, 119994608 rows affected (27 min 46.46 sec) Records: 119994608 Deleted: 0 Skipped: 0 Warnings: 0

mysql> alter table lineorder add key(LO_OrderDateKey), add key(LO_CustKey), add key(LO_PartKey), add key(LO_SuppKey); Query OK, 0 rows affected (21 min 29.79 sec) Records: 0 Duplicates: 0 Warnings: 0

InnoDB uncompressed size on disk: 20G

full index scan

mysql> explain select count(*) from lineorder; +----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+ | 1 | SIMPLE | lineorder | index | NULL | LO_OrderDateKey | 4 | NULL | 116404108 | Using index | +----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+ 1 row in set (0.01 sec)

mysql> select count() from lineorder; +-----------+ | count() | +-----------+ | 119994609 | +-----------+ 1 row in set (36.49 sec)

Full table scan

mysql> explain select count(*) from lineorder where LO_OrderDateKey + LO_SuppKey is not null; +----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+ | 1 | SIMPLE | lineorder | ALL | NULL | NULL | NULL | NULL | 116404108 | Using where | +----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+ 1 row in set (0.00 sec)

mysql> select count(_) from lineorder where LO_OrderDateKey + LOSuppKey is not null; +-----------+ | count() | +-----------+ | 119994609 | +-----------+ 1 row in set (1 min 10.86 sec)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebook/mysql-5.6/issues/369#issuecomment-257124702, or mute the thread https://github.com/notifications/unsubscribe-auth/ABkKTZT2Bty8b4s0PLyrcwkU9Jq50Ubaks5q4-ivgaJpZM4KkAge .

Mark Callaghan mdcallag@gmail.com

greenlion commented 7 years ago

Attached is the output from several SHOW ENGINE ROCKSDB STATUS commands executed serially while the point query is running. status.txt

greenlion commented 7 years ago

Interestingly:

mysql> show global variables like '%rock%checksum%';
+----------------------------------+-------+
| Variable_name                    | Value |
+----------------------------------+-------+
| rocksdb_checksums_pct            | 100   |
| rocksdb_store_checksums          | OFF   |
| rocksdb_verify_checksums         | OFF   |
+----------------------------------+-------+
8 rows in set (0.00 sec)
mdcallag commented 7 years ago

Maybe this helps with those variable names - https://github.com/facebook/mysql-5.6/issues/374

Not sure why the write buffer is 4M for you, or for me -> https://github.com/facebook/mysql-5.6/issues/375

greenlion commented 7 years ago

Thanks for the info. FYI, I am about to do a round of testing on a table with a primary key to see if that makes any difference.

gunnarku commented 7 years ago

Please re-open if this is still an issue. We added more data about RocksDB engine status recently as well.