duckdb / duckdb_mysql

https://duckdb.org/docs/extensions/mysql
MIT License
55 stars 13 forks source link

MySQL -> DuckDB export causes segfault #90

Closed benjamingr closed 1 week ago

benjamingr commented 1 month ago

What happens?

When running MySQL -> DuckDB import on a simple setup (testcontainers) a segfault occurs.

To Reproduce

When setting up test containers with duckdb to evaluate MySQL export as such (full fixture setup code):

def populate_mysql(mysql: MySqlContainer):
    connection = pymysql.connect(
        host=mysql.get_container_host_ip(),
        port=int(mysql.get_exposed_port(3306)),
        user=mysql.username,
        password=mysql.password,
        database=f"test",
        autocommit=True
    )

    with connection.cursor() as cursor:
        for table_idx in range(TABLE_COUNT):
            # Create users table with columns "id", "email", "password_hash", "credit_card_number" with fake PII
            cursor.execute(
                f"CREATE TABLE users{table_idx} (id INT AUTO_INCREMENT PRIMARY KEY, email TEXT, password_hash TEXT, credit_card_number TEXT);")
            # Insert some fake data
            for i in range(100):
                cursor.execute(
                    f"INSERT INTO users{table_idx} (email, password_hash, credit_card_number) VALUES ('user{i}@example.com', 'hash{i}', '1111-2222-3333-4444');")
            # Create orders table with columns "id", "user_id", "order_date", "order_total", "address" with fake PII
            cursor.execute(
                f"CREATE TABLE orders{table_idx} (id INT AUTO_INCREMENT PRIMARY KEY, user_id INT, order_date TIMESTAMP, order_total DECIMAL(10,2), address TEXT);")
            # Insert some fake transactions
            for i in range(100):
                cursor.execute(
                    f"INSERT INTO orders{table_idx} (user_id, order_date, order_total, address) VALUES ({(i + 10) % 10}, '2022-01-01 00:00:00', 100.00, '1234 Main St, Springfield, IL 62701');")
    connection.close()

And then connecting to the testcontainer and performing "export from" on it (similar PostgreSQL code works):

INSTALL mysql;
LOAD mysql;

ATTACH 'database=test user=test password=test host=localhost port=60057' AS d (TYPE MYSQL, READ_ONLY);

--- Executing this line causes a segfault, crashes DBeaver (or Python when running Python code)
COPY FROM DATABASE d TO memory;

OS:

macOS 14.3 (23D56)

DuckDB Version:

latest ( "1.1.1" )

DuckDB Client:

Python (also with DBeaver)

Hardware:

M3 max

Full Name:

Benjamin Gruenbaum

Affiliation:

Eon.io

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have tested with a stable release

Did you include all relevant data sets for reproducing the issue?

Yes

Did you include all code required to reproduce the issue?

Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?

benjamingr commented 1 month ago

(Just for reference, running regular queries against the MySQL code works, e.g. SELECT * from d.users0 works and returns the correct results)