facebook / mysql-5.6

Facebook's branch of the Oracle MySQL database. This includes MyRocks.
http://myrocks.io
Other
2.48k stars 714 forks source link

mutex contention from shard level CPU stats #658

Open mdcallag opened 7 years ago

mdcallag commented 7 years ago

Sysbench QPS is much lower at high concurrency with this diff:

commit 28c37186a5de54d26624b5a0d274078b7f47bb10
Author: Anirban Rahut <arahut@fb.com>
Date:   Sat Jun 17 10:24:17 2017 -0700
    Shard level CPU stats (system + user ) for MySQLD

This is QPS for 1, 8 and 48 threads doing point-lookup queries for an in-memory database with sysbench

1            8           48           clients
16333   128531  353950  myrocks.feb10
14654   118656  336515  myrocks.apr14
14030   111232  192247  myrocks.jul14

Command line is:

./sysbench --test=/data/mysql/sysbench.lua/oltp_point_select.lua --db-driver=mysql --mysql-user=... --mysql-password=... --mysql-host=127.0.0.1 --mysql-db=test --mysql-storage-engine=rocksdb --range-size=100 --table-size=1000000 --tables=8 --threads=48 --events=0 --time=120 run
mdcallag commented 7 years ago

An example from PMP

     24 __lll_lock_wait,pthread_mutex_lock,get_db_stats(char,mysql_execute_command(THD*,,Prepared_statement::execute(String*,,Prepared_statement::execute_loop(String*,,mysqld_stmt_execute(THD*,,dispatch_command(enum_server_command,,do_command(THD*),do_handle_one_connection(THD*),handle_one_connection(void*),start_thread,clone

#0  0x00007f2fa0d11ebc in __lll_lock_wait () at /usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libpthread.so.0
#1  0x00007f2fa0d0c07b in pthread_mutex_lock () at /usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libpthread.so.0
#2  0x0000000000791c84 in get_db_stats(char const*) (db=0x7f2e02cab008 "test") at /data/users/mcallaghan/mysql.me/5.6/sql/db_stats.cc:85
#3  0x000000000091a280 in mysql_execute_command(THD*, unsigned long long*, unsigned long long*) (thd=thd@entry=0x7f2f9a75e000, statement_start_time=statement_start_time@entry=0x7f2e030d3cb0, post_parse=post_parse@entry=0x7f2e030d3cb8) at /data/users/mcallaghan/mysql.me/5.6/sql/sql_parse.cc:6147
#4  0x00000000009344ee in Prepared_statement::execute(String*, bool) (this=this@entry=0x7f2b68465180, expanded_query=expanded_query@entry=0x7f2e030d3f00, open_cursor=open_cursor@entry=false) at /data/users/mcallaghan/mysql.me/5.6/sql/sql_prepare.cc:4074
#5  0x00000000009346fe in Prepared_statement::execute_loop(String*, bool, unsigned char*, unsigned char*) (this=0x7f2b68465180, expanded_query=0x7f2e030d3f00, open_cursor=<optimized out>, packet=<optimized out>, packet_end=<optimized out>) at /data/users/mcallaghan/mysql.me/5.6/sql/sql_prepare.cc:3700
#6  0x00000000009349ca in mysqld_stmt_execute(THD*, char*, unsigned int) (thd=thd@entry=0x7f2f9a75e000, packet_arg=packet_arg@entry=0x7f2f9a763001 "", packet_length=packet_length@entry=19) at /data/users/mcallaghan/mysql.me/5.6/sql/sql_prepare.cc:2706
#7  0x00000000009223f3 in dispatch_command(enum_server_command, THD*, char*, unsigned int, Srv_session*) (command=command@entry=COM_STMT_EXECUTE, thd=thd@entry=0x7f2f9a75e000, packet=packet@entry=0x7f2f9a763001 "", packet_length=packet_length@entry=19, srv_session=srv_session@entry=0x0) at /data/users/mcallaghan/mysql.me/5.6/sql/sql_parse.cc:1778
#8  0x0000000000924124 in do_command(THD*) (thd=0x7f2f9a75e000) at /data/users/mcallaghan/mysql.me/5.6/sql/sql_parse.cc:1207
#9  0x00000000008e332a in do_handle_one_connection(THD*) (thd_arg=<optimized out>) at /data/users/mcallaghan/mysql.me/5.6/sql/sql_connect.cc:1142
#10 0x00000000008e3589 in handle_one_connection(void*) (arg=<optimized out>) at /data/users/mcallaghan/mysql.me/5.6/sql/sql_connect.cc:963
#11 0x00007f2fa0d097f1 in start_thread () at /usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libpthread.so.0
#12 0x00007f2f9ee1e46d in clone () at /usr/local/fbcode/gcc-4.9-glibc-2.20/lib/libc.so.6
mdcallag commented 7 years ago

I didn't look at the code but I am curious why the impact from this is much worse than the impact from existing user/table stats code. Is too much work done while holding the mutex?