I want to export data from a che_test table to parquet, this table has 800w+ data, but my query export statement is COPY (SELECT * FROM che_test limit 1) TO 'che_test_1.parquet', this export speed should be very fast, but the actual execution takes 16s, which needs to be optimized
I have 3000W+ mysql data in my production environment, I want to export a parquet file every day according to the business time, the amount of data is about 200W+ per day, when I export these data, it requires 10G memory, and the speed is very slow,Also, after exporting the data from the 2000W+ table, I found that the total number of data is not exactly the same as in the database, and the value of the last field is missing, but I can't be sure that this is a bug。
To Reproduce
python code
import duckdb
sql = '''
LOAD mysql;
ATTACH 'host=localhost user=root password=123456 port=3306 database=che' AS mysqldb (TYPE mysql);
USE mysqldb;
COPY (SELECT * FROM che_test limit 1) TO 'che_test_1.parquet';
'''
duck_df = duckdb.sql(sql)
OS:
Windows 10
MySQL Version:
8.0.22
DuckDB Version:
0.9.2
DuckDB Client:
Python
Full Name:
Bleeth
Affiliation:
developer
Have you tried this on the latest main branch?
[X] I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
Thanks for the report! I've pushed a fix in #30 that enables support for pushing LIMIT/OFFSET into MySQL scans which should solve the performance issue on the provided query.
What happens?
I want to export data from a che_test table to parquet, this table has 800w+ data, but my query export statement is COPY (SELECT * FROM che_test limit 1) TO 'che_test_1.parquet', this export speed should be very fast, but the actual execution takes 16s, which needs to be optimized
I have 3000W+ mysql data in my production environment, I want to export a parquet file every day according to the business time, the amount of data is about 200W+ per day, when I export these data, it requires 10G memory, and the speed is very slow,Also, after exporting the data from the 2000W+ table, I found that the total number of data is not exactly the same as in the database, and the value of the last field is missing, but I can't be sure that this is a bug。
To Reproduce
python code
OS:
Windows 10
MySQL Version:
8.0.22
DuckDB Version:
0.9.2
DuckDB Client:
Python
Full Name:
Bleeth
Affiliation:
developer
Have you tried this on the latest
main
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?