facebook / mysql-5.6

Facebook's branch of the Oracle MySQL database. This includes MyRocks.
http://myrocks.io
Other
2.49k stars 715 forks source link

ANALYZE TABLE does not seem to update statistics for the RocksDB table #160

Closed valeriikravchuk closed 8 years ago

valeriikravchuk commented 8 years ago

Please, check this quick test:

mysql> select count() from tr; +----------+ | count() | +----------+ | 2048 | +----------+ 1 row in set (0.01 sec)

mysql> select count(distinct c1) from tr; +--------------------+ | count(distinct c1) | +--------------------+ | 872 | +--------------------+ 1 row in set (0.01 sec)

mysql> show create table tr\G *** 1. row *** Table: tr Create Table: CREATE TABLE tr ( id int(11) NOT NULL AUTO_INCREMENT, c1 int(11) DEFAULT NULL, PRIMARY KEY (id), KEY c1 (c1) ) ENGINE=ROCKSDB AUTO_INCREMENT=4085 DEFAULT CHARSET=utf8 COLLATE=utf8_bin 1 row in set (0.00 sec)

So, we have many unique values in the table for both primary and secondary index. But even after ANALYZE:

mysql> analyze table tr; +---------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+----------+ | test.tr | analyze | status | OK | +---------+---------+----------+----------+ 1 row in set (0.00 sec)

mysql> show indexes from tr\G *** 1. row *** Table: tr Non_unique: 0 Key_name: PRIMARY Seq_in_index: 1 Column_name: id Collation: A Cardinality: 2 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: Index_comment: *** 2. row *** Table: tr Non_unique: 1 Key_name: c1 Seq_in_index: 1 Column_name: c1 Collation: A Cardinality: 1 Sub_part: NULL Packed: NULL Null: YES Index_type: BTREE Comment: Index_comment: 2 rows in set (0.00 sec)

mysql> explain select count(*) from tr; +----+-------------+-------+-------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+-------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | tr | index | NULL | c1 | 5 | NULL | 2 | Using index | +----+-------------+-------+-------+---------------+------+---------+------+------+-------------+ 1 row in set (0.00 sec)

we do not get anything remotely close to real cardinality for both indexes. I feel this may influence queries with joins badly.

valeriikravchuk commented 8 years ago

It seems cardinality depends on "initial" table's content, we can get better estimations after converting to MyISAM and back:

mysql> alter table tr engine=MyISAM; Query OK, 2048 rows affected (0.13 sec) Records: 2048 Duplicates: 0 Warnings: 0

mysql> analyze table tr; +---------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+----------+ | test.tr | analyze | status | OK | +---------+---------+----------+----------+ 1 row in set (0.00 sec)

mysql> show indexes from tr\G *** 1. row *** Table: tr Non_unique: 0 Key_name: PRIMARY Seq_in_index: 1 Column_name: id Collation: A Cardinality: 2048 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: Index_comment: *** 2. row *** Table: tr Non_unique: 1 Key_name: c1 Seq_in_index: 1 Column_name: c1 Collation: A Cardinality: 1024 Sub_part: NULL Packed: NULL Null: YES Index_type: BTREE Comment: Index_comment: 2 rows in set (0.00 sec)

mysql> alter table tr engine=RocksDB; Query OK, 2048 rows affected (0.28 sec) Records: 2048 Duplicates: 0 Warnings: 0

mysql> analyze table tr; +---------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------+---------+----------+----------+ | test.tr | analyze | status | OK | +---------+---------+----------+----------+ 1 row in set (0.00 sec)

mysql> show indexes from tr\G *** 1. row *** Table: tr Non_unique: 0 Key_name: PRIMARY Seq_in_index: 1 Column_name: id Collation: A Cardinality: 942 Sub_part: NULL Packed: NULL Null: Index_type: BTREE Comment: Index_comment: *** 2. row *** Table: tr Non_unique: 1 Key_name: c1 Seq_in_index: 1 Column_name: c1 Collation: A Cardinality: 471 Sub_part: NULL Packed: NULL Null: YES Index_type: BTREE Comment: Index_comment: 2 rows in set (0.00 sec)

hermanlee commented 8 years ago

During the initial analyze table, was the table newly created and the data recently added? The index cardinality implementation has a limitation. Cardinality is calculated during the flush/compaction process when the memtable is written out to the sst data files. So when the data is flushed out, there should be an improvement in the cardinality stats. However, while the data lives entire in the memtable (as is the case with newly created tables/added data that did not trigger a flush), no cardinality stats would have been collected. In your test, it's possible the alter triggered a flush on the new table contents and gave better cardinality.

Could you try a larger data set or issue the command "set global rocksdb_force_flush_memtable_now=1" to trigger a flush of the memtable data after the table's been populated if your test consists mainly of small tables.

valeriikravchuk commented 8 years ago

I'll try to do these tests, but the point is that in MySQL world it's expected that explicit ANALYZE TABLE statement should get statistics forcefully updated based on current table contents, no matter when it is usually re-estimated (by background threads in case of InnoDB and its "persistent" statistics). So, I'd expect whatever flushes and calculations are needed to get updated statistics are executed automatically when ANALYZE TABLE runs.

yoshinorim commented 8 years ago

This issue happens because index statistics (cardinality) are stored in SST files, but not in Memtable, and ANALYZE TABLE currently doesn't flush Memtable. Flushing Memtable is pretty expensive. But without flushing, SHOW INDEX is confusing to users. So there are trade-offs. I think it may make sense to add a new session variable "rocksdb_flush_memtable_on_analyze" (default on) and triggering Memtable flush on ANALYZE TABLE.

valeriikravchuk commented 8 years ago

I agree with Yoshinori, the option forcing ANALYZE TABLE to do what users expect would be the ideal solution.

mdcallag commented 8 years ago

I'd like to see an option to flush the memtable on ANALYZE

On Wed, Feb 3, 2016 at 8:07 AM, Yoshinori Matsunobu < notifications@github.com> wrote:

This issue happens because index statistics (cardinality) are stored in SST files, but not in Memtable, and ANALYZE TABLE currently doesn't flush Memtable. Flushing Memtable is pretty expensive. But without flushing, SHOW INDEX is confusing to users. So there are trade-offs. I think it may make sense to add a new session variable "rocksdb_flush_memtable_on_analyze" (default on) and triggering Memtable flush on ANALYZE TABLE.

— Reply to this email directly or view it on GitHub https://github.com/facebook/mysql-5.6/issues/160#issuecomment-179316012.

Mark Callaghan mdcallag@gmail.com