Closed greenlion closed 7 years ago
From SHOW ENGINE ROCKSDB STATUS. There is some detail, but not enough, in https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide
This is per column family and some of the counters are probably non-zero for you.
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
We'll update the "data loading" wiki after fixing all known bulk loading issues, but general recommendation is all secondary indexes should be added after loading data. In your case, consider the followings.
SET session rocksdb_bulk_load=1;
CREATE TABLE lineorder ... (primary key only)
LOAD DATA ...
ALTER TABLE lineorder ADD INDEX ...., ADD INDEX ...;
This is because secondary index modifications during load data will cause massive RocksDB compactions. If you monitor rocksdb_rows_inserted and iostat -xm, you'll often see high volume of writes to storage, even though rocksdb_rows_inserted was relatively lower. RocksDB also has a feature to slow down or suspend writes, if compactions can't keep up with writes -- rocksdb_rate_limiter_bytes_per_sec my.cnf variable and level0_slowdown_writes_trigger Column Family option are major parameters to control write speeds.
Fast index creation (ALTER TABLE .. ADD INDEX or CREATE INDEX) writes SST files in the bottommost RocksDB levels directly, bypassing compactions. So you won't see such stalls.
We'll update our wiki and describe these details once we fix known bugs. In your case, fast index creation will help.
Have you considered implementing ALTER TABLE .. [DISABLE|ENABLE] KEYS?
I'm trying to count the rows in the fact table (loaded with indexes in place) but RocksDB seems to be "stuck".
query is:
select count(*) from lineorder;
MySQL is using all of one core, and it isn't doing much IO:
[justin@localhost ssb-dbgen]$ iostat -kx 5
Linux 3.10.0-327.36.3.el7.x86_64 (localhost.localdomain) 10/29/2016 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
3.45 0.01 0.95 0.16 0.00 95.43
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.03 0.00 15.53 0.00 0.65 0.65 0.00 0.44 0.00
sdb 0.30 1.60 5.52 10.86 563.26 4703.35 642.85 1.55 94.75 0.77 142.52 0.87 1.43
sdc 0.00 0.00 0.00 0.00 0.02 0.00 14.83 0.00 3.64 3.64 0.00 3.41 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
12.07 0.00 1.93 0.55 0.00 85.44
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 268.80 0.20 1552.80 0.80 11.55 0.05 0.17 0.17 0.00 0.17 4.60
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
12.36 0.00 1.93 0.53 0.00 85.18
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 261.80 0.00 1464.80 0.00 11.19 0.04 0.17 0.17 0.00 0.17 4.32
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
12.22 0.00 1.86 0.50 0.00 85.42
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 260.80 0.00 1464.80 0.00 11.23 0.04 0.16 0.16 0.00 0.16 4.06
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
12.60 0.00 2.03 0.50 0.00 84.86
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.60 245.40 3.00 1343.20 18.00 10.96 0.05 0.18 0.17 1.53 0.17 4.14
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
12.38 0.00 2.29 0.48 0.00 84.85
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 0.00 235.60 1.40 1313.60 584.80 16.02 0.05 0.20 0.19 3.00 0.18 4.38
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
This is on a consumer grade SanDisk SSD. It can easily sustain higher IO rates.
The data doesn't appear to be buffered by MySQL:
[justin@localhost ssb-dbgen]$ free -m
total used free shared buff/cache available
Mem: 11753 1719 145 12 9888 9698
Swap: 999 459 540
[justin@localhost ssb-dbgen]$ sudo ./pmp
[sudo] password for justin:
2 pthread_cond_wait@@GLIBC_2.3.2,inline_mysql_cond_wait,at,block_pthread=true),do_handle_one_connection,handle_one_connection,start_thread,clone
2 pthread_cond_wait@@GLIBC_2.3.2,ConditionWait,thread_id=thread_id@entry=0),rocksdb::BGThreadWrapper,start_thread,clone
1 Slow_CRC32,pointer>,,out>,,Value,buf=0x7facd0b67ef0,footer=...,,rocksdb::(anonymous,Create,index_reader=index_reader@entry=0x7facf4a46c10,,rocksdb::BlockBasedTable::NewIndexIterator,rocksdb::BlockBasedTable::Get,rocksdb::TableCache::Get,rocksdb::Version::Get,rocksdb::DBImpl::GetImpl,rocksdb::DBImpl::Get,rocksdb::WriteBatchWithIndex::GetFromBatchAndDB,rocksdb::TransactionBaseImpl::Get,myrocks::Rdb_transaction_impl::get,myrocks::ha_rocksdb::get_row_by_rowid,get_row_by_rowid,keyno=<optimized,index_next_with_direction,buf=0x7facd000acf8,handle_unordered_next,buf=0x7facd000acf8,join_read_next,sub_select,do_select,at,mysql_execute_select,tables=0x7facd0005260,,handle_select,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,do_handle_one_connection,handle_one_connection,start_thread,clone
1 sigwait,signal_hand,start_thread,clone
1 sigwaitinfo,timer_notify_thread_func,start_thread,clone
1 pthread_cond_timedwait@@GLIBC_2.3.2,inline_mysql_cond_timedwait,<myrocks::rdb_drop_idx_thread>),myrocks::Rdb_thread::thread_func,start_thread,clone
1 pthread_cond_timedwait@@GLIBC_2.3.2,inline_mysql_cond_timedwait,<myrocks::rdb_bg_thread>),myrocks::Rdb_thread::thread_func,start_thread,clone
1 poll,handle_connections_sockets,handle_connections_sockets_all,argv=0x1f90670),__libc_start_main,_start
1
perf
85.37% my-oneconnectio mysqld [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>
7.21% my-oneconnectio [kernel.kallsyms] [k] copy_user_generic_string
0.51% my-oneconnectio [kernel.kallsyms] [k] __find_get_page
0.46% mysqld [kernel.kallsyms] [k] native_write_msr_safe
0.46% my-oneconnectio [kernel.kallsyms] [k] file_read_actor
0.45% my-oneconnectio [kernel.kallsyms] [k] put_page
0.21% mysqld [kernel.kallsyms] [k] native_read_msr_safe
0.16% my-oneconnectio mysqld [.] rocksdb::BlockIter::BinarySeek
0.16% my-oneconnectio [kernel.kallsyms] [k] generic_file_aio_read
0.15% mysqld libpthread-2.17.so [.] pthread_cond_timedwait@@GLIBC_2.3.2
0.14% my-oneconnectio [kernel.kallsyms] [k] __radix_tree_lookup
0.12% my-oneconnectio libc-2.17.so [.] __memcmp_sse4_1
0.12% mysqld [vdso] [.] __vdso_clock_gettime
0.10% my-oneconnectio libc-2.17.so [.] _int_malloc
0.10% my-oneconnectio [vdso] [.] __vdso_gettimeofday
0.09% mysqld [kernel.kallsyms] [k] __audit_syscall_exit
0.08% my-oneconnectio mysqld [.] rocksdb::HistogramStat::Add
0.08% my-oneconnectio libc-2.17.so [.] __memcpy_ssse3
0.08% mysqld [kernel.kallsyms] [k] native_read_tsc
0.08% my-oneconnectio libc-2.17.so [.] malloc_consolidate
0.08% mysqld [kernel.kallsyms] [k] get_futex_key_refs.isra.11
0.07% my-oneconnectio libc-2.17.so [.] malloc
0.07% mysqld [kernel.kallsyms] [k] futex_wait_queue_me
0.07% my-oneconnectio libc-2.17.so [.] _int_free
0.06% my-oneconnectio mysqld [.] rocksdb::BlockIter::Seek
0.06% mysqld [kernel.kallsyms] [k] system_call
0.06% my-oneconnectio [kernel.kallsyms] [k] mark_page_accessed
0.06% mysqld [kernel.kallsyms] [k] system_call_after_swapgs
0.06% my-oneconnectio mysqld [.] rocksdb::StatisticsImpl::recordTick
0.06% mysqld [kernel.kallsyms] [k] _raw_spin_lock
0.06% my-oneconnectio mysqld [.] rocksdb::InternalKeyComparator::Compare
0.06% mysqld [kernel.kallsyms] [k] __schedule
0.05% mysqld [kernel.kallsyms] [k] cpuacct_charge
0.05% mysqld [kernel.kallsyms] [k] hash_futex
0.05% mysqld [kernel.kallsyms] [k] futex_wait
0.05% my-oneconnectio libpthread-2.17.so [.] pthread_mutex_unlock
zoom into ::Fast_CRC32
85.37% my-oneconnectio mysqld [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32> ▒
7.21% my-oneconnectio [kernel.kallsyms] [k] copy_user_generic_string ▒
0.51% my-oneconnectio [kernel.kallsyms] [k] __find_get_page ▒
0.46% my-oneconnectio [kernel.kallsyms] [k] file_read_actor ▒
0.45% my-oneconnectio [kernel.kallsyms] [k] put_page ▒
0.16% my-oneconnectio mysqld [.] rocksdb::BlockIter::BinarySeek ▒
0.16% my-oneconnectio [kernel.kallsyms] [k] generic_file_aio_read ▒
0.14% my-oneconnectio [kernel.kallsyms] [k] __radix_tree_lookup ▒
0.12% my-oneconnectio libc-2.17.so [.] __memcmp_sse4_1 ▒
0.10% my-oneconnectio libc-2.17.so [.] _int_malloc ▒
0.10% my-oneconnectio [vdso] [.] __vdso_gettimeofday ▒
0.08% my-oneconnectio mysqld [.] rocksdb::HistogramStat::Add ▒
0.08% my-oneconnectio libc-2.17.so [.] __memcpy_ssse3 ▒
0.08% my-oneconnectio libc-2.17.so [.] malloc_consolidate ▒
0.07% my-oneconnectio libc-2.17.so [.] malloc
zoom into copy_user_generic_string
7.21% my-oneconnectio [k] copy_user_generic_string ▒
0.51% my-oneconnectio [k] __find_get_page ▒
0.46% my-oneconnectio [k] file_read_actor ▒
0.45% my-oneconnectio [k] put_page ▒
0.16% my-oneconnectio [k] generic_file_aio_read ▒
0.14% my-oneconnectio [k] __radix_tree_lookup ▒
0.06% my-oneconnectio [k] mark_page_accessed ▒
0.04% my-oneconnectio [k] _cond_resched ▒
0.04% my-oneconnectio [k] radix_tree_lookup_slot ▒
0.04% my-oneconnectio [k] native_write_msr_safe
mysql> show engine rocksdb status\G
*************************** 1. row ***************************
Type: DBSTATS
Name: rocksdb
Status:
** DB Stats **
Uptime(secs): 57886.6 total, 855.1 interval
Cumulative writes: 613K writes, 611M keys, 612K commit groups, 1.0 writes per commit group, ingest: 18.76 GB, 0.33 MB/s
Cumulative WAL: 613K writes, 10 syncs, 55789.18 writes per sync, written: 18.76 GB, 0.33 MB/s
Cumulative stall: 03:13:20.714 H:M:S, 20.0 percent
Interval writes: 0 writes, 0 keys, 0 commit groups, 0.0 writes per commit group, ingest: 0.00 MB, 0.00 MB/s
Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
** Level 0 read latency histogram (micros):
Count: 4899572 Average: 1.9046 StdDev: 0.71
Min: 0 Median: 1.4151 Max: 89
Percentiles: P50: 1.42 P75: 1.75 P99: 3.38 P99.9: 10.70 P99.99: 14.96
------------------------------------------------------
[ 0, 1 ) 910537 18.584% 18.584% ####
[ 1, 2 ) 3708541 75.691% 94.275% ###############
[ 2, 3 ) 221917 4.529% 98.804% #
[ 3, 4 ) 25533 0.521% 99.326%
[ 4, 5 ) 16139 0.329% 99.655%
[ 5, 6 ) 5433 0.111% 99.766%
[ 6, 7 ) 1816 0.037% 99.803%
[ 7, 8 ) 1466 0.030% 99.833%
[ 8, 9 ) 1034 0.021% 99.854%
[ 9, 10 ) 976 0.020% 99.874%
[ 10, 12 ) 3637 0.074% 99.948%
[ 12, 14 ) 1845 0.038% 99.986%
[ 14, 16 ) 432 0.009% 99.995%
[ 16, 18 ) 145 0.003% 99.998%
[ 18, 20 ) 33 0.001% 99.998%
[ 20, 25 ) 37 0.001% 99.999%
[ 25, 30 ) 17 0.000% 99.999%
[ 30, 35 ) 6 0.000% 99.999%
[ 35, 40 ) 9 0.000% 100.000%
[ 40, 45 ) 3 0.000% 100.000%
[ 45, 50 ) 8 0.000% 100.000%
[ 50, 60 ) 2 0.000% 100.000%
[ 60, 70 ) 4 0.000% 100.000%
[ 80, 90 ) 2 0.000% 100.000%
** Level 1 read latency histogram (micros):
Count: 44621056 Average: 1.8442 StdDev: 3.07
Min: 0 Median: 1.3628 Max: 7111
Percentiles: P50: 1.36 P75: 1.69 P99: 2.77 P99.9: 11.31 P99.99: 131.42
------------------------------------------------------
[ 0, 1 ) 10087190 22.606% 22.606% #####
[ 1, 2 ) 33692106 75.507% 98.114% ###############
[ 2, 3 ) 516302 1.157% 99.271%
[ 3, 4 ) 111313 0.249% 99.520%
[ 4, 5 ) 72270 0.162% 99.682%
[ 5, 6 ) 23452 0.053% 99.735%
[ 6, 7 ) 12091 0.027% 99.762%
[ 7, 8 ) 14101 0.032% 99.793%
[ 8, 9 ) 11069 0.025% 99.818%
[ 9, 10 ) 12148 0.027% 99.845%
[ 10, 12 ) 37118 0.083% 99.929%
[ 12, 14 ) 15617 0.035% 99.964%
[ 14, 16 ) 3009 0.007% 99.970%
[ 16, 18 ) 1449 0.003% 99.974%
[ 18, 20 ) 333 0.001% 99.974%
[ 20, 25 ) 443 0.001% 99.975%
[ 25, 30 ) 1047 0.002% 99.978%
[ 30, 35 ) 950 0.002% 99.980%
[ 35, 40 ) 360 0.001% 99.981%
[ 40, 45 ) 331 0.001% 99.981%
[ 45, 50 ) 331 0.001% 99.982%
[ 50, 60 ) 1893 0.004% 99.986%
[ 60, 70 ) 401 0.001% 99.987%
[ 70, 80 ) 212 0.000% 99.988%
[ 80, 90 ) 183 0.000% 99.988%
[ 90, 100 ) 190 0.000% 99.988%
[ 100, 120 ) 360 0.001% 99.989%
[ 120, 140 ) 569 0.001% 99.991%
[ 140, 160 ) 2316 0.005% 99.996%
[ 160, 180 ) 1316 0.003% 99.999%
[ 180, 200 ) 375 0.001% 100.000%
[ 200, 250 ) 137 0.000% 100.000%
[ 250, 300 ) 11 0.000% 100.000%
[ 300, 350 ) 4 0.000% 100.000%
[ 350, 400 ) 3 0.000% 100.000%
[ 400, 450 ) 2 0.000% 100.000%
[ 450, 500 ) 1 0.000% 100.000%
[ 500, 600 ) 2 0.000% 100.000%
[ 600, 700 ) 3 0.000% 100.000%
[ 700, 800 ) 2 0.000% 100.000%
[ 800, 900 ) 4 0.000% 100.000%
[ 900, 1000 ) 3 0.000% 100.000%
[ 1000, 1200 ) 5 0.000% 100.000%
[ 1200, 1400 ) 6 0.000% 100.000%
[ 1400, 1600 ) 5 0.000% 100.000%
[ 1600, 1800 ) 2 0.000% 100.000%
[ 1800, 2000 ) 6 0.000% 100.000%
[ 2000, 2500 ) 3 0.000% 100.000%
[ 2500, 3000 ) 4 0.000% 100.000%
[ 3000, 3500 ) 5 0.000% 100.000%
[ 3500, 4000 ) 1 0.000% 100.000%
[ 5000, 6000 ) 1 0.000% 100.000%
[ 7000, 8000 ) 1 0.000% 100.000%
** Level 2 read latency histogram (micros):
Count: 66811738 Average: 3.4027 StdDev: 58.92
Min: 0 Median: 1.3528 Max: 104140
Percentiles: P50: 1.35 P75: 1.71 P99: 57.07 P99.9: 135.85 P99.99: 235.76
------------------------------------------------------
[ 0, 1 ) 16717277 25.021% 25.021% #####
[ 1, 2 ) 47300070 70.796% 95.818% ##############
[ 2, 3 ) 788252 1.180% 96.997%
[ 3, 4 ) 164175 0.246% 97.243%
[ 4, 5 ) 110904 0.166% 97.409%
[ 5, 6 ) 35953 0.054% 97.463%
[ 6, 7 ) 19794 0.030% 97.492%
[ 7, 8 ) 20841 0.031% 97.524%
[ 8, 9 ) 16181 0.024% 97.548%
[ 9, 10 ) 18388 0.028% 97.575%
[ 10, 12 ) 54188 0.081% 97.657%
[ 12, 14 ) 21559 0.032% 97.689%
[ 14, 16 ) 4575 0.007% 97.696%
[ 16, 18 ) 2116 0.003% 97.699%
[ 18, 20 ) 496 0.001% 97.700%
[ 20, 25 ) 740 0.001% 97.701%
[ 25, 30 ) 28022 0.042% 97.743%
[ 30, 35 ) 23735 0.036% 97.778%
[ 35, 40 ) 4377 0.007% 97.785%
[ 40, 45 ) 1955 0.003% 97.788%
[ 45, 50 ) 146264 0.219% 98.007%
[ 50, 60 ) 939398 1.406% 99.413%
[ 60, 70 ) 132744 0.199% 99.611%
[ 70, 80 ) 105533 0.158% 99.769%
[ 80, 90 ) 36878 0.055% 99.824%
[ 90, 100 ) 19838 0.030% 99.854%
[ 100, 120 ) 24166 0.036% 99.890%
[ 120, 140 ) 8210 0.012% 99.903%
[ 140, 160 ) 10543 0.016% 99.918%
[ 160, 180 ) 15630 0.023% 99.942%
[ 180, 200 ) 21667 0.032% 99.974%
[ 200, 250 ) 14802 0.022% 99.996%
[ 250, 300 ) 484 0.001% 99.997%
[ 300, 350 ) 81 0.000% 99.997%
[ 350, 400 ) 50 0.000% 99.997%
[ 400, 450 ) 58 0.000% 99.997%
[ 450, 500 ) 30 0.000% 99.997%
[ 500, 600 ) 100 0.000% 99.998%
[ 600, 700 ) 58 0.000% 99.998%
[ 700, 800 ) 58 0.000% 99.998%
[ 800, 900 ) 70 0.000% 99.998%
[ 900, 1000 ) 35 0.000% 99.998%
[ 1000, 1200 ) 92 0.000% 99.998%
[ 1200, 1400 ) 67 0.000% 99.998%
[ 1400, 1600 ) 48 0.000% 99.998%
[ 1600, 1800 ) 45 0.000% 99.998%
[ 1800, 2000 ) 61 0.000% 99.998%
[ 2000, 2500 ) 168 0.000% 99.999%
[ 2500, 3000 ) 121 0.000% 99.999%
[ 3000, 3500 ) 101 0.000% 99.999%
[ 3500, 4000 ) 60 0.000% 99.999%
[ 4000, 4500 ) 37 0.000% 99.999%
[ 4500, 5000 ) 60 0.000% 99.999%
[ 5000, 6000 ) 50 0.000% 99.999%
[ 6000, 7000 ) 56 0.000% 99.999%
[ 7000, 8000 ) 38 0.000% 99.999%
[ 8000, 9000 ) 48 0.000% 99.999%
[ 9000, 10000 ) 38 0.000% 99.999%
[ 10000, 12000 ) 55 0.000% 100.000%
[ 12000, 14000 ) 30 0.000% 100.000%
[ 14000, 16000 ) 26 0.000% 100.000%
[ 16000, 18000 ) 26 0.000% 100.000%
[ 18000, 20000 ) 36 0.000% 100.000%
[ 20000, 25000 ) 30 0.000% 100.000%
[ 25000, 30000 ) 27 0.000% 100.000%
[ 30000, 35000 ) 29 0.000% 100.000%
[ 35000, 40000 ) 19 0.000% 100.000%
[ 40000, 45000 ) 9 0.000% 100.000%
[ 45000, 50000 ) 13 0.000% 100.000%
[ 50000, 60000 ) 20 0.000% 100.000%
[ 60000, 70000 ) 8 0.000% 100.000%
[ 70000, 80000 ) 1 0.000% 100.000%
[ 80000, 90000 ) 9 0.000% 100.000%
[ 90000, 100000 ) 12 0.000% 100.000%
[ 100000, 120000 ) 3 0.000% 100.000%
** Level 3 read latency histogram (micros):
Count: 10252766 Average: 19.5681 StdDev: 196.14
Min: 0 Median: 1.6620 Max: 162157
Percentiles: P50: 1.66 P75: 46.39 P99: 115.02 P99.9: 224.72 P99.99: 1698.46
------------------------------------------------------
[ 0, 1 ) 1410496 13.757% 13.757% ###
[ 1, 2 ) 5613165 54.748% 68.505% ###########
[ 2, 3 ) 263515 2.570% 71.075% #
[ 3, 4 ) 27262 0.266% 71.341%
[ 4, 5 ) 18586 0.181% 71.522%
[ 5, 6 ) 7120 0.069% 71.592%
[ 6, 7 ) 4294 0.042% 71.634%
[ 7, 8 ) 3568 0.035% 71.669%
[ 8, 9 ) 2188 0.021% 71.690%
[ 9, 10 ) 2354 0.023% 71.713%
[ 10, 12 ) 7704 0.075% 71.788%
[ 12, 14 ) 3751 0.037% 71.825%
[ 14, 16 ) 1191 0.012% 71.836%
[ 16, 18 ) 555 0.005% 71.842%
[ 18, 20 ) 262 0.003% 71.844%
[ 20, 25 ) 575 0.006% 71.850%
[ 25, 30 ) 94175 0.919% 72.768%
[ 30, 35 ) 66727 0.651% 73.419%
[ 35, 40 ) 10417 0.102% 73.521%
[ 40, 45 ) 126835 1.237% 74.758%
[ 45, 50 ) 89476 0.873% 75.630%
[ 50, 60 ) 1756932 17.136% 92.767% ###
[ 60, 70 ) 269158 2.625% 95.392% #
[ 70, 80 ) 182039 1.776% 97.167%
[ 80, 90 ) 95225 0.929% 98.096%
[ 90, 100 ) 49150 0.479% 98.576%
[ 100, 120 ) 57954 0.565% 99.141%
[ 120, 140 ) 20915 0.204% 99.345%
[ 140, 160 ) 10467 0.102% 99.447%
[ 160, 180 ) 16682 0.163% 99.610%
[ 180, 200 ) 21838 0.213% 99.823%
[ 200, 250 ) 16055 0.157% 99.979%
[ 250, 300 ) 601 0.006% 99.985%
[ 300, 350 ) 57 0.001% 99.986%
[ 350, 400 ) 43 0.000% 99.986%
[ 400, 450 ) 17 0.000% 99.986%
[ 450, 500 ) 17 0.000% 99.986%
[ 500, 600 ) 94 0.001% 99.987%
[ 600, 700 ) 59 0.001% 99.988%
[ 700, 800 ) 29 0.000% 99.988%
[ 800, 900 ) 22 0.000% 99.988%
[ 900, 1000 ) 34 0.000% 99.989%
[ 1000, 1200 ) 42 0.000% 99.989%
[ 1200, 1400 ) 43 0.000% 99.989%
[ 1400, 1600 ) 34 0.000% 99.990%
[ 1600, 1800 ) 36 0.000% 99.990%
[ 1800, 2000 ) 39 0.000% 99.991%
[ 2000, 2500 ) 212 0.002% 99.993%
[ 2500, 3000 ) 197 0.002% 99.995%
[ 3000, 3500 ) 76 0.001% 99.995%
[ 3500, 4000 ) 58 0.001% 99.996%
[ 4000, 4500 ) 30 0.000% 99.996%
[ 4500, 5000 ) 34 0.000% 99.996%
[ 5000, 6000 ) 45 0.000% 99.997%
[ 6000, 7000 ) 31 0.000% 99.997%
[ 7000, 8000 ) 33 0.000% 99.998%
[ 8000, 9000 ) 20 0.000% 99.998%
[ 9000, 10000 ) 14 0.000% 99.998%
[ 10000, 12000 ) 32 0.000% 99.998%
[ 12000, 14000 ) 28 0.000% 99.998%
[ 14000, 16000 ) 18 0.000% 99.999%
[ 16000, 18000 ) 9 0.000% 99.999%
[ 18000, 20000 ) 14 0.000% 99.999%
[ 20000, 25000 ) 30 0.000% 99.999%
[ 25000, 30000 ) 12 0.000% 99.999%
[ 30000, 35000 ) 8 0.000% 99.999%
[ 35000, 40000 ) 13 0.000% 99.999%
[ 40000, 45000 ) 7 0.000% 100.000%
[ 50000, 60000 ) 10 0.000% 100.000%
[ 60000, 70000 ) 5 0.000% 100.000%
[ 70000, 80000 ) 16 0.000% 100.000%
[ 80000, 90000 ) 13 0.000% 100.000%
[ 90000, 100000 ) 1 0.000% 100.000%
[ 120000, 140000 ) 1 0.000% 100.000%
[ 160000, 180000 ) 1 0.000% 100.000%
** Level 6 read latency histogram (micros):
Count: 29 Average: 58.7586 StdDev: 116.89
Min: 0 Median: 2.2500 Max: 445
Percentiles: P50: 2.25 P75: 87.50 P99: 442.75 P99.9: 445.00 P99.99: 445.00
------------------------------------------------------
[ 0, 1 ) 12 41.379% 41.379% ########
[ 1, 2 ) 2 6.897% 48.276% #
[ 2, 3 ) 2 6.897% 55.172% #
[ 3, 4 ) 2 6.897% 62.069% #
[ 4, 5 ) 1 3.448% 65.517% #
[ 5, 6 ) 2 6.897% 72.414% #
[ 80, 90 ) 1 3.448% 75.862% #
[ 90, 100 ) 1 3.448% 79.310% #
[ 100, 120 ) 1 3.448% 82.759% #
[ 120, 140 ) 2 6.897% 89.655% #
[ 200, 250 ) 1 3.448% 93.103% #
[ 400, 450 ) 2 6.897% 100.000% #
*************************** 2. row ***************************
Type: CF_COMPACTION
Name: __system__
Status:
** Compaction Stats [__system__] **
Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
L0 2/0 0.03 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.9 0 73 0.003 0 0
L1 1/0 0.02 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 24.2 6.3 0 18 0.004 18K 11K
Sum 3/0 0.06 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.3 5.0 5.2 0 91 0.003 18K 11K
Int 0/0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
Uptime(secs): 57886.6 total, 57886.6 interval
Flush(GB): cumulative 0.001, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.3 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
*************************** 3. row ***************************
Type: CF_COMPACTION
Name: cf_link_pk
Status:
** Compaction Stats [cf_link_pk] **
Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sum 0/0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
Int 0/0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
Uptime(secs): 57886.6 total, 57886.6 interval
Flush(GB): cumulative 0.000, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
*************************** 4. row ***************************
Type: CF_COMPACTION
Name: default
Status:
** Compaction Stats [default] **
Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
L0 1/0 0.94 0.2 0.0 0.0 0.0 18.3 18.3 0.0 0.0 0.0 56.9 329 8360 0.039 0 0
L1 7/0 200.02 0.8 169.4 18.3 151.2 168.6 17.5 0.0 9.2 26.6 26.5 6520 483 13.500 4807M 286K
L2 42/0 2540.07 1.0 248.4 16.8 231.6 247.9 16.3 0.4 14.7 43.0 42.9 5919 262 22.591 6449M 0
L3 225/0 14167.48 0.6 39.9 12.1 27.9 39.6 11.7 2.1 3.3 28.9 28.6 1415 140 10.108 1765M 3713K
L6 4/0 151.28 0.0 0.0 0.0 0.0 0.2 0.2 0.0 0.0 0.0 6795.9 0 5 0.007 0 0
Sum 279/0 17059.78 0.0 457.8 47.2 410.6 474.6 63.9 2.6 25.6 33.1 34.3 14183 9250 1.533 13G 4000K
Int 0/0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
Uptime(secs): 57886.6 total, 57886.6 interval
Flush(GB): cumulative 18.273, interval 0.000
AddFile(GB): cumulative 0.234, interval 0.000
AddFile(Total Files): cumulative 5, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 2642556, interval 0
Cumulative compaction: 474.58 GB write, 8.40 MB/s write, 457.83 GB read, 8.10 MB/s read, 14183.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 1181 level0_slowdown, 385 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 3 memtable_compaction, 0 memtable_slowdown, interval 0 total count
*************************** 5. row ***************************
Type: CF_COMPACTION
Name: rev:cf_link_id1_type
Status:
** Compaction Stats [rev:cf_link_id1_type] **
Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sum 0/0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
Int 0/0 0.00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0
Uptime(secs): 57886.6 total, 57886.6 interval
Flush(GB): cumulative 0.000, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
*************************** 6. row ***************************
Type: Memory_Stats
Name: rocksdb
Status:
MemTable Total: 7694016
MemTable Unflushed: 96952
Table Readers Total: 0
Cache Total: 9931621
Default Cache Capacity: 0
6 rows in set (0.01 sec)
It is reading about 2K rows/second from the table. This query would take ~55500 seconds at the current rate of reads.
The .rocksdb folder is 17GB. Raw data was ~12GB of data.
Loading data without indexes was significantly faster, and the count completes quickly after loading. mysql> select count() from lineorder; +-----------+ | count() | +-----------+ | 119994608 | +-----------+ 1 row in set (2 min 3.76 sec)
Size on disk: 9.9GB
I will add the indexes and continue testing.
Can you grep $datadir/.rocksdb/LOG for "Fast CRC32 supported"
First guess, fast crc32 was not used, fast means HW accelerated based on seeing this in PMP -> "Slow_CRC32" and this in perf -> 85.37% my-oneconnectio mysqld [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>
Note there is a compile-time and run-time choice that determines whether fast crc32 is used. At compile time, support is enabled when SSE4_2 and LP64 are defined: https://github.com/facebook/rocksdb/blob/master/util/crc32c.cc#L318
Runtime check that determines whether it can be used is at: https://github.com/facebook/rocksdb/blob/master/util/crc32c.cc#L391
There is no mention of CRC (I did case insensitive search) in the LOG file.
Here is processor info:
processor : 7
vendor_id : AuthenticAMD
cpu family : 21
model : 2
model name : AMD FX(tm)-8350 Eight-Core Processor
stepping : 0
microcode : 0x600081f
cpu MHz : 4013.463
cache size : 2048 KB
physical id : 0
siblings : 8
core id : 7
cpu cores : 4
apicid : 23
initial apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1
bogomips : 8026.92
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
Your CPU supports sse4_2. On my Ubuntu 16.04 VM that runs on my Mac laptop with core i7 pure RocksDB builds (https://github.com/facebook/rocksdb) get fast crc32.
The difference below occurs because the pure-RocksDB build on my laptop is done with -the -march=native flag. The embedded RocksDB build used by FB MySQL does not use the -march=native flag. I will file a separate issue for the build problem. I have encountered this before.
This is output from the preprocessor for util/crc32c.cc
static inline void Fast_CRC32(uint64_t* l, uint8_t const **p) {
*l = _mm_crc32_u64(*l, LE_LOAD64(*p));
*p += 8;
# 332 "util/crc32c.cc"
}
static bool isSSE42() {
uint32_t c_;
uint32_t d_;
__asm__("cpuid" : "=c"(c_), "=d"(d_) : "a"(1) : "ebx");
return c_ & (1U << 20);
}
Same VM, but from the embedded RocksDB build done as part of FB MySQL, fast crc32 is not enabled at compile-time:
static inline void Fast_CRC32(uint64_t* l, uint8_t const **p) {
# 330 "/home/mdcallag/git/fbmysql/rocksdb/util/crc32c.cc"
Slow_CRC32(l, p);
}
static bool isSSE42() {
uint32_t c_;
uint32_t d_;
__asm__("cpuid" : "=c"(c_), "=d"(d_) : "a"(1) : "ebx");
return c_ & (1U << 20);
}
Compiler command lines from my Ubuntu VM. First is RocksDB embedded by FB MySQL:
/usr/bin/c++ -DHAVE_CONFIG_H -DHAVE_SCHED_GETCPU=1 -DOS_LINUX -DROCKSDB_LIB_IO_POSIX -DROCKSDB_PLATFORM_POSIX -DZLIB -I/home/mdcallag/git/fbmysql/include -I/home/mdcallag/git/fbmysql/rocksdb -I/home/mdcallag/git/fbmysql/rocksdb/include -I/home/mdcallag/git/fbmysql/rocksdb/third-party/gtest-1.7.0/fused-src -I/home/mdcallag/git/fbmysql/sql -I/home/mdcallag/git/fbmysql/regex -I/home/mdcallag/git/fbmysql/zlib -Wall -Wextra -Wformat-security -Woverloaded-virtual -Wno-unused-parameter -std=c++11 -O3 -g -fabi-version=2 -fno-omit-frame-pointer -fno-strict-aliasing -DDBUG_OFF -c /home/mdcallag/git/fbmysql/rocksdb/util/crc32c.cc
Next is pure-RocksDB
g++ -g -W -Wextra -Wall -Wsign-compare -Wshadow -Wno-unused-parameter -I. -I./include -std=c++11 -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -DOS_LINUX -fno-builtin-memcmp -DROCKSDB_FALLOCATE_PRESENT -DSNAPPY -DGFLAGS=gflags -DZLIB -DBZIP2 -DLZ4 -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -march=native "-DROCKSDB_JEMALLOC" -isystem ./third-party/gtest-1.7.0/fused-src -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -DNDEBUG -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -c util/crc32c.cc
Can you determine whether rocksdb/util/crc32.cc is compiled with -march=native?
https://github.com/facebook/rocksdb/issues/1416 prevents whether CRC32 is enabled from getting logged to $datadir/.rocksdb/LOG. The workaround is to change the log level. Default is error -> "rocksdb_info_log_level error_level". Add this to my.cnf: rocksdb_info_log_level=info_level
And then grep again. I see this on my VM:
016/10/29-08:48:36.270223 7f30a69f1940 Compression algorithms supported:
2016/10/29-08:48:36.270226 7f30a69f1940 Snappy supported: 0
2016/10/29-08:48:36.270228 7f30a69f1940 Zlib supported: 1
2016/10/29-08:48:36.270231 7f30a69f1940 Bzip supported: 0
2016/10/29-08:48:36.270233 7f30a69f1940 LZ4 supported: 0
2016/10/29-08:48:36.270236 7f30a69f1940 Fast CRC32 supported: 0
I will do so ASAP. I have some other things on my plate ATM.
Workaround for getting fast crc32 support is described in https://github.com/facebook/mysql-5.6/issues/370#issuecomment-257102068
After adding indexes, the query now takes forever. I'll try to get fast crc32 support per @mdcallag suggestion and see what happens. But I think it is important to note that table scans are impossible without fast CRC32 support. InnoDB is slower with native checksums but not so slow that the engine becomes unusable.
just fyi, after adding the indexes the CRC is the top consumer of time.
75.93% my-oneconnectio mysqld [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>
5.94% my-oneconnectio [kernel.kallsyms] [k] copy_user_generic_string
2.36% mysqld [kernel.kallsyms] [k] native_write_msr_safe
1.18% mysqld [kernel.kallsyms] [k] native_read_msr_safe
0.75% mysqld libpthread-2.17.so [.] pthread_cond_timedwait@@GLIBC_2.3.2
0.57% mysqld [vdso] [.] __vdso_clock_gettime
Thanks. I will try scans with & without HW crc32 on one of my servers to understand the impact. I might not have much time for debugging the next 2 weeks because of work.
I wonder if we have people using RocksDB on non x86 platforms and whether we will get faster HW crc32 there, or a faster software crc32 for them.
I took a look in crc32c.cc - shouldn't the above stack show Slow_CRC32 if the slower CRC32 is being used? As a test I added -DFORCE_CRC32=1 to CMAKE_CXX_FLAGS and I put an #ifdef in to make sure the _64 function was being called. It compiled fine, but I still see all the time in the Fast_CRC32 function.
Just to be extra sure I modified rocksdb::crc32c.c::Fast_CRC32 to be simply:
static inline void Fast_CRC32(uint64_t* l, uint8_t const **p) {
*l = _mm_crc32_u64(*l, LE_LOAD64(*p));
*p += 8;
}
Thus FastCRC is being used for sure. For some reason, after adding indexes to this data, the server gets bogged down by excessive crc32 operations when reading from the index.
index scan based read:
mysql> explain select count(*) From lineorder;
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
| 1 | SIMPLE | lineorder | index | NULL | LO_OrderDateKey | 4 | NULL | 119966000 | Using index |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
1 row in set (0.00 sec)
If I force a full table scan instead of a full index scan, the query returns results more quickly:
mysql> explain select count(*) From lineorder where LO_OrderDateKey + LO_SuppKey is not null;
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
| 1 | SIMPLE | lineorder | ALL | NULL | NULL | NULL | NULL | 119966000 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
1 row in set (0.00 sec)
mysql> select count(*) From lineorder where LO_OrderDateKey + LO_SuppKey is not null;
+-----------+
| count(*) |
+-----------+
| 119994608 |
+-----------+
1 row in set (3 min 43.48 sec)
perf output for full table scan
3.70% mysqld mysqld [.] rocksdb::MergingIterator::Next ▒
3.41% mysqld [kernel.kallsyms] [k] native_write_msr_safe ▒
3.33% mysqld libc-2.17.so [.] __memcpy_ssse3 ▒
3.31% mysqld [vdso] [.] __vdso_getcpu ▒
3.09% mysqld mysqld [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32> ▒
2.79% mysqld mysqld [.] myrocks::ha_rocksdb::convert_record_from_storage_format ▒
2.29% mysqld mysqld [.] myrocks::ha_rocksdb::rnd_next_with_direction ▒
2.12% mysqld mysqld [.] rocksdb::DBIter::Next ▒
1.95% mysqld mysqld [.] rocksdb::BlockIter::Next
Excellent questions but I have slide decks for 2 talks that need to be made better.
When I return I will try your tests for InnoDB and RocksDB with and without fast crc32. The only thing I don't have is an AMD CPU.
Are you saying my code modification is using slow CRC? I changed the code to only call the SSE function, and the perf output would have shown slow_crc32 if the #ifdef wasn't running the hardware function.
I'll hook up my 12 core i970 and run the same test there. I'll let you know what I find.
As configured on the i970.
Per your workaround, I added -march=native -msse -msse4.2 to CMAKE_CXX_FLAGS
-- Running cmake version 2.8.11
-- MySQL 5.6.27
-- Packaging as: mysql-5.6.27-Linux-x86_64
-- HAVE_VISIBILITY_HIDDEN
-- OPENSSL_INCLUDE_DIR = /usr/include
-- OPENSSL_LIBRARY = /usr/lib64/libssl.so
-- CRYPTO_LIBRARY = /usr/lib64/libcrypto.so
-- OPENSSL_MAJOR_VERSION = 1
-- SSL_LIBRARIES = /usr/lib64/libssl.so;/usr/lib64/libcrypto.so;dl
-- Using cmake version 2.8.11
-- Not building NDB
-- Library webscalesqlclient depends on OSLIBS -lpthread;m;rt;/usr/lib64/libssl.so;/usr/lib64/libcrypto.so;dl
-- Library mysqlserver depends on OSLIBS -lpthread;m;rt;/usr/lib64/libssl.so;/usr/lib64/libcrypto.so;dl;crypt;aio;numa;snappy;lz4;bz2;-lrt
-- CMAKE_BUILD_TYPE: RelWithDebInfo
-- COMPILE_DEFINITIONS: HAVE_CONFIG_H
-- CMAKE_C_FLAGS: -Wall -Wextra -Wformat-security -Wwrite-strings -std=c99
-- CMAKE_CXX_FLAGS: -march=native -msse -msse4.2 -Wall -Wextra -Wformat-security -Woverloaded-virtual -Wno-unused-parameter -std=c++11
-- CMAKE_C_FLAGS_RELWITHDEBINFO: -O3 -g -fabi-version=2 -fno-omit-frame-pointer -fno-strict-aliasing -DDBUG_OFF
-- CMAKE_CXX_FLAGS_RELWITHDEBINFO: -O3 -g -fabi-version=2 -fno-omit-frame-pointer -fno-strict-aliasing -DDBUG_OFF
-- Configuring done
-- Generating done
-- Build files have been written to: /home/justin/src/myrocks/build
processor : 11 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Core(TM) i7 CPU 970 @ 3.20GHz stepping : 2 microcode : 0x13 cpu MHz : 1600.000 cache size : 12288 KB physical id : 0 siblings : 12 core id : 10 cpu cores : 6 apicid : 21 initial apicid : 21 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt aes lahf_lm ida arat epb dtherm tpr_shadow vnmi flexpriority ept vpid bogomips : 6408.21 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management:
Here is the perf from the index scan on the i970:
38.35% mysqld mysqld [.] rocksdb::crc32c::ExtendImpl<&rocksdb::crc32c::Fast_CRC32>
28.94% mysqld [kernel.kallsyms] [k] copy_user_generic_string
2.44% mysqld [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.47% mysqld [kernel.kallsyms] [k] put_page
1.19% mysqld [kernel.kallsyms] [k] native_read_tsc
1.08% mysqld [kernel.kallsyms] [k] delay_tsc
0.86% mysqld [kernel.kallsyms] [k] file_read_actor
0.82% mysqld [kernel.kallsyms] [k] __find_get_page
0.66% mysqld [kernel.kallsyms] [k] generic_file_aio_read
0.65% mysqld [kernel.kallsyms] [k] iowrite8
0.62% mysqld [kernel.kallsyms] [k] __radix_tree_lookup
0.49% mysqld [kernel.kallsyms] [k] __audit_syscall_exit
0.48% mysqld libc-2.17.so [.] __memcmp_sse4_1
0.47% mysqld mysqld [.] rocksdb::BlockIter::BinarySeek
Here is the diff to force fast CRC32. Fast CRC32 is in use on both of my machines, and the index scan is very slow on both AMD64 and Intel Core i7.
diff --git a/util/crc32c.cc b/util/crc32c.cc
index ce57454..d933e51 100644
--- a/util/crc32c.cc
+++ b/util/crc32c.cc
@@ -316,19 +316,8 @@ static inline void Slow_CRC32(uint64_t* l, uint8_t const **p) {
}
static inline void Fast_CRC32(uint64_t* l, uint8_t const **p) {
-#ifdef __SSE4_2__
-#ifdef __LP64__
*l = _mm_crc32_u64(*l, LE_LOAD64(*p));
*p += 8;
-#else
- *l = _mm_crc32_u32(static_cast<unsigned int>(*l), LE_LOAD32(*p));
- *p += 4;
- *l = _mm_crc32_u32(static_cast<unsigned int>(*l), LE_LOAD32(*p));
- *p += 4;
-#endif
-#else
- Slow_CRC32(l, p);
-#endif
}
template<void (*CRC32)(uint64_t*, uint8_t const**)>
@@ -389,10 +378,12 @@ static bool isSSE42() {
typedef uint32_t (*Function)(uint32_t, const char*, size_t);
static inline Function Choose_Extend() {
- return isSSE42() ? ExtendImpl<Fast_CRC32> : ExtendImpl<Slow_CRC32>;
+ // return isSSE42() ? ExtendImpl<Fast_CRC32> : ExtendImpl<Slow_CRC32>;
+ return ExtendImpl<Fast_CRC32>;
}
bool IsFastCrc32Supported() {
+ return true;
#ifdef __SSE4_2__
return isSSE42();
#else
Even a basic aggregate query takes a very long time:
mysql> explain select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
| 1 | SIMPLE | lineorder | ref | LO_OrderDateKey | LO_OrderDateKey | 4 | const | 6613 | Using index |
+----+-------------+-----------+------+-----------------+-----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
mysql> select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----------+
| count(*) |
+----------+
| 49519 |
+----------+
1 row in set (11.43 sec)
Load time for uncompressed InnoDB table: mysql> load data infile '/home/justin/src/ssb-dbgen/lineorder.tbl' into table lineorder fields terminated by '|' lines terminated by '|\n'; Query OK, 119994608 rows affected (27 min 46.46 sec) Records: 119994608 Deleted: 0 Skipped: 0 Warnings: 0
mysql> alter table lineorder add key(LO_OrderDateKey), add key(LO_CustKey), add key(LO_PartKey), add key(LO_SuppKey); Query OK, 0 rows affected (21 min 29.79 sec) Records: 0 Duplicates: 0 Warnings: 0
InnoDB uncompressed size on disk: 20G
full index scan
mysql> explain select count(*) from lineorder;
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
| 1 | SIMPLE | lineorder | index | NULL | LO_OrderDateKey | 4 | NULL | 116404108 | Using index |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
1 row in set (0.01 sec)
mysql> select count(*) from lineorder;
+-----------+
| count(*) |
+-----------+
| 119994609 |
+-----------+
1 row in set (36.49 sec)
Full table scan
mysql> explain select count(*) from lineorder where LO_OrderDateKey + LO_SuppKey is not null;
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
| 1 | SIMPLE | lineorder | ALL | NULL | NULL | NULL | NULL | 116404108 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+
1 row in set (0.00 sec)
mysql> select count(*) from lineorder where LO_OrderDateKey + LO_SuppKey is not null;
+-----------+
| count(*) |
+-----------+
| 119994609 |
+-----------+
1 row in set (1 min 10.86 sec)
aggregate query
mysql> select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----------+
| count(*) |
+----------+
| 49519 |
+----------+
1 row in set (0.05 sec)
InnoDB compressed table, KEY_BLOCK_SIZE=8
mysql> load data infile '/home/justin/src/ssb-dbgen/lineorder.tbl' into table lineorder fields terminated by '|' lines terminated by '|\n';Query OK, 119994608 rows affected (1 hour 26 min 51.56 sec)
Records: 119994608 Deleted: 0 Skipped: 0 Warnings: 0
mysql> alter table lineorder add key(LO_OrderDateKey), add key(LO_CustKey), add key(LO_PartKey), add key(LO_SuppKey);Query OK, 0 rows affected (36 min 53.69 sec)
Records: 0 Duplicates: 0 Warnings: 0
On disk size: 9.9GB
index scan on compress data
mysql> explain select count(*) from lineorder ;
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
| 1 | SIMPLE | lineorder | index | NULL | LO_OrderDateKey | 4 | NULL | 116426330 | Using index |
+----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+
1 row in set (0.00 sec)
mysql> select count(*) from lineorder ;
+-----------+
| count(*) |
+-----------+
| 119994608 |
+-----------+
1 row in set (54.78 sec)
FTS on compressed data
mysql> select count(*) from lineorder where LO_OrderDateKey + LO_SuppKey is not null;
+-----------+
| count(*) |
+-----------+
| 119994608 |
+-----------+
1 row in set (3 min 22.84 sec)
aggregate over compressed data
mysql> select count(*) from lineorder where LO_OrderDateKey = 19921201;
+----------+
| count(*) |
+----------+
| 49519 |
+----------+
1 row in set (0.04 sec)
Can you share SHOW ENGINE ROCKSDB STATUS output from the time when the point query is slow?
On Sat, Oct 29, 2016 at 5:45 PM, Justin Swanhart notifications@github.com wrote:
Load time for uncompressed InnoDB table: mysql> load data infile '/home/justin/src/ssb-dbgen/lineorder.tbl' into table lineorder fields terminated by '|' lines terminated by '|\n'; Query OK, 119994608 rows affected (27 min 46.46 sec) Records: 119994608 Deleted: 0 Skipped: 0 Warnings: 0
mysql> alter table lineorder add key(LO_OrderDateKey), add key(LO_CustKey), add key(LO_PartKey), add key(LO_SuppKey); Query OK, 0 rows affected (21 min 29.79 sec) Records: 0 Duplicates: 0 Warnings: 0
InnoDB uncompressed size on disk: 20G
full index scan
mysql> explain select count(*) from lineorder; +----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+ | 1 | SIMPLE | lineorder | index | NULL | LO_OrderDateKey | 4 | NULL | 116404108 | Using index | +----+-------------+-----------+-------+---------------+-----------------+---------+------+-----------+-------------+ 1 row in set (0.01 sec)
mysql> select count() from lineorder; +-----------+ | count() | +-----------+ | 119994609 | +-----------+ 1 row in set (36.49 sec)
Full table scan
mysql> explain select count(*) from lineorder where LO_OrderDateKey + LO_SuppKey is not null; +----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+ | 1 | SIMPLE | lineorder | ALL | NULL | NULL | NULL | NULL | 116404108 | Using where | +----+-------------+-----------+------+---------------+------+---------+------+-----------+-------------+ 1 row in set (0.00 sec)
mysql> select count(_) from lineorder where LO_OrderDateKey + LOSuppKey is not null; +-----------+ | count() | +-----------+ | 119994609 | +-----------+ 1 row in set (1 min 10.86 sec)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/facebook/mysql-5.6/issues/369#issuecomment-257124702, or mute the thread https://github.com/notifications/unsubscribe-auth/ABkKTZT2Bty8b4s0PLyrcwkU9Jq50Ubaks5q4-ivgaJpZM4KkAge .
Mark Callaghan mdcallag@gmail.com
Attached is the output from several SHOW ENGINE ROCKSDB STATUS commands executed serially while the point query is running. status.txt
Interestingly:
mysql> show global variables like '%rock%checksum%';
+----------------------------------+-------+
| Variable_name | Value |
+----------------------------------+-------+
| rocksdb_checksums_pct | 100 |
| rocksdb_store_checksums | OFF |
| rocksdb_verify_checksums | OFF |
+----------------------------------+-------+
8 rows in set (0.00 sec)
Maybe this helps with those variable names - https://github.com/facebook/mysql-5.6/issues/374
Not sure why the write buffer is 4M for you, or for me -> https://github.com/facebook/mysql-5.6/issues/375
Thanks for the info. FYI, I am about to do a round of testing on a table with a primary key to see if that makes any difference.
Please re-open if this is still an issue. We added more data about RocksDB engine status recently as well.
I am loading five tables as part of the star schema benchmark.
I got an unusual stall loading one of the tables: -- fact table is expected to take a long time to load mysql> load data infile '/home/justin/src/ssb-dbgen/lineorder.tbl' into table lineorder fields terminated by '|' lines terminated by '|\n' ; Query OK, 119994608 rows affected (3 hours 42 min 11.51 sec) Records: 119994608 Deleted: 0 Skipped: 0 Warnings: 0
-- this load was fast mysql> load data infile '/home/justin/src/ssb-dbgen/date.tbl' into table dim_date fields terminated by '|' lines terminated by '|\n' ;Query OK, 2556 rows affected (0.04 sec) Records: 2556 Deleted: 0 Skipped: 0 Warnings: 0
-- this was about 5x slower than expected mysql> load data infile '/home/justin/src/ssb-dbgen/part.tbl' into table part fields terminated by '|' lines terminated by '|\n' ; Query OK, 1000000 rows affected (1 min 2.25 sec) Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0
-- then this one was fast again (though there was some delay between the prior and this one) mysql> load data infile '/home/justin/src/ssb-dbgen/supplier.tbl' into table supplier fields terminated by '|' lines terminated by '|\n' ; Query OK, 40000 rows affected (0.41 sec) Records: 40000 Deleted: 0 Skipped: 0 Warnings: 0
I tried loading the part table a second time, and it was fast:
mysql> rename table part to part2; Query OK, 0 rows affected (0.01 sec)
mysql> create table part like part2; Query OK, 0 rows affected (0.02 sec)
-- this time it loads faster mysql> load data infile '/home/justin/src/ssb-dbgen/part.tbl' into table part fields terminated by '|' lines terminated by '|\n' ; Query OK, 1000000 rows affected (15.17 sec) Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0
Do you have any hints about where to look / what to examine if I see this again? pmp?