Open junranhe opened 5 months ago
Thanks for the report! This extension does not push down aggregations into MySQL, meaning the table is fully loaded into DuckDB before processing the aggregation in DuckDB. The new mysql_query
functionality allows you to directly run a query within MySQL, see https://github.com/duckdb/duckdb_mysql/pull/50. This is effectively what other connectors (e.g. the JDBC connector) are doing.
thanks for reply, I m chinese, my English is so poor, In my case, my data is so small ( 3 rows) and my query is so simple( 1 line), so I can not understand why my query preparestatment run 350ms (just prepare not execute), and load 3 rows data run 400ms? (I just want to build a remote server jdbc with duckdb, mysql_query just a simple function , can not setParameter in the query string, like: select * from tb where a = ? and b= ?; stat.setString(1, 'jr' ); stat.setInt(2, 100);)
On running the first query the catalog information is loaded which could be where the time is going. Try running the query a second time perhaps?
I run the query many time, it execute 350ms + 400ms the same,
Hm, that's a bit excessive yes. Could you try this:
BEGIN TRANSACTION;
.timer on
select sum(amt) from mysqldb.order.order_info;
select sum(amt) from mysqldb.order.order_info;
I try 2 query in the TRANSACTION, the first query is slow: 350ms + 400ms, the second query is fast (prepare with 2ms, run with 20ms)
@Mytherin hello,can give me some advice? or this problem will fix in next duckdb_mysql version? then I close this issuses
I encounter the same performance decrease issue on MySQL. TPC-H(sf=1) Query 17 in PG+DuckDB costs 2.15s while MySQL costs 30.37s! And filter_pushdown
does not help.
I sample both scenarios and check their explain analyze
, as the following attaches.
I notice that mysql_scan
nodes have wrong EC=1, causing a different join order. What's more, postgres flamegraph has a PerfectHashJoinExecutor
while mysql one does not. Might they be related?
I also set debug_show_queries to both and find that mysql scanner does not implement the parallel scan used in postgres
What happens?
I want to use duckdb to speed up the mysql query in mysql db,just like: select sum(amt) from mysqldb.order.order_info (3rows for test), the query prepare run 350 ms, and execute query with 400ms, I think it is too slow , compare with direct use mysql jdbc client ( run with 2ms), I expect the mysql extension can run less than 100ms in small data(less than 1000rows), and speed up 10x in bigdata(more than 100000 rows),compare with mysql jdbc, how can I do for this?
To Reproduce
install mysql; load mysql;
attack "......" use mysqldb
select sum(amt) from mysqldb.order.order_info
OS:
mac
MySQL Version:
8.1
DuckDB Version:
0.10.1
DuckDB Client:
java
Full Name:
何俊然
Affiliation:
有信科技 youxin china
Have you tried this on the latest
main
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?