Open thbley opened 1 year ago
I found the problem for c1 = 1000000. Based on (https://github.com/duckdb/duckdb_mysql/commit/d391e5b3cd2cae6c084a31b26bc6a702c82ec7a0) I was missing:
SET GLOBAL mysql_experimental_filter_pushdown=true;
D select * from t1 where c1 = 1000000;
100% ┌─────────┬──────────────────────────────────────┬─────────────────────┐
│ c1 │ c2 │ c3 │
│ int32 │ varchar │ timestamp │
├─────────┼──────────────────────────────────────┼─────────────────────┤
│ 1000000 │ c37de652-83ad-11ee-b826-0242ac110002 │ 2023-11-15 11:54:41 │
└─────────┴──────────────────────────────────────┴─────────────────────┘
Run Time (s): real 30.895 user 11.832409 sys 1.670956
D SET GLOBAL mysql_experimental_filter_pushdown=true;
Run Time (s): real 0.000 user 0.000000 sys 0.000171
D select * from t1 where c1 = 1000000;
┌─────────┬──────────────────────────────────────┬─────────────────────┐
│ c1 │ c2 │ c3 │
│ int32 │ varchar │ timestamp │
├─────────┼──────────────────────────────────────┼─────────────────────┤
│ 1000000 │ c37de652-83ad-11ee-b826-0242ac110002 │ 2023-11-15 11:54:41 │
└─────────┴──────────────────────────────────────┴─────────────────────┘
Run Time (s): real 0.013 user 0.005777 sys 0.001366
It would be great if select count() could be optimized in the mysql extension if the table has a primary key.
e.g.
D select count() from t1;
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 8388608 │
└──────────────┘
Run Time (s): real 4.287 user 1.170847 sys 0.196719
D select count(c1) from t1;
┌───────────┐
│ count(c1) │
│ int64 │
├───────────┤
│ 8388608 │
└───────────┘
Run Time (s): real 17.080 user 2.306154 sys 0.252885
D select count(*) from t1;
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 8388608 │
└──────────────┘
Run Time (s): real 4.438 user 0.941101 sys 0.221295
D select count(*) from t1 where c1 > 0;
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 8388608 │
└──────────────┘
Run Time (s): real 18.549 user 2.303993 sys 0.310117
D select count(*) from t1 where c1 != 0;
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 8388608 │
└──────────────┘
Run Time (s): real 16.512 user 2.107903 sys 0.258574
mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
| 8388608 |
+----------+
1 row in set (0.76 sec)
mysql> select count(c1) from t1;
+-----------+
| count(c1) |
+-----------+
| 8388608 |
+-----------+
1 row in set (0.77 sec)
mysql> select count(*) from t1 where c1 != 0;
+----------+
| count(*) |
+----------+
| 8388608 |
+----------+
1 row in set (4.45 sec)
mysql> explain select count(*) from t1;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | NULL | PRIMARY | 4 | NULL | 8161825 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
What happens?
I tested duckdb v0.9.2 3c695d7ba9 with latest linux_amd64_gcc4-extensions (https://github.com/duckdb/duckdb_mysql/actions/runs/6863109179 linux_amd64_gcc4-extensions) and mysql 8.0.35.
Currently performance in duckdb is lower compared to mysql client. Query results are correct, utf8mb4 works!
I tested:
with mysql client:
It would be great if performance could be optimized a bit for primary key selects!
Copying data from mysql to duckdb was very fast, much faster than copying inside of mysql!
To Reproduce
Create a table with 3 columns (int, varchar, datetime) and fill it with 1m rows. Execute queries to select from primary key and run a count(*).
OS:
Ubuntu 22.04.3
MySQL Version:
8.0.35
DuckDB Version:
0.9.2
DuckDB Client:
cli
Full Name:
Thomas Bley
Affiliation:
myself
Have you tried this on the latest
main
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?