duckdb / duckdb_mysql

MIT License
45 stars 10 forks source link

Migrating from pyodbc to duckdb_mysql challenges #69

Closed fillalph closed 2 months ago

fillalph commented 2 months ago

I am looking to migrate from the Python pyodbc package to duckdb_mysql extension in an enterprise environment.

/

conn = pyodbc.connect(
    "Driver={SQL Server};"+
    f"Server={DB_HOST};"+
    f"Database={DB_NAME};"+
    "Trusted_Connection=yes;"
)

sqlTable = "Table_Name"
sql = f"SELECT TOP 5 * FROM {sqlTable}"
df = pd.read_sql(sql, conn)

df.head()

This works well.

/

DB_PORT = '1433'
MYSQL_UNIX_PORT = '3306'

PARAMETERS = f'host={DB_HOST} port={DB_PORT} socket={MYSQL_UNIX_PORT} database={DB_NAME}'
ATTACH_QUERY = f'ATTACH \'{PARAMETERS}\' AS mysql_db (TYPE MYSQL, READ_ONLY);'
USE_QUERY = 'USE mysql_db;'
print(ATTACH_QUERY)

duckdb_con.sql(ATTACH_QUERY)
# duckdb_con.sql(USE_QUERY)
# duckdb_con.sql('SHOW TABLES').show()
IO Error: Failed to connect to MySQL database with parameters "host={} port=1433 socket=3306 database={}": Lost connection to MySQL server at 'reading initial communication packet', system error: 0

As the SQL database is hosted in an enterprise enviroment, I am not sure if I have the port and socket right. I tried using the netstat command in windows to figure it out. I see the server ip:ms-sql-s.

I have tried the ATTACH_QUERY command also without a port and socket (just a host and database) and I get this error:

IO Error: Failed to connect to MySQL database with parameters "host=PRODDWAGL2 database=ONESOURCEDATAMART": Can't connect to MySQL server on 'PRODDWAGL2:3306' (10061)

Does anyone have a thought on what I could try to get the connection established with duckdb_mysql?

Thanks

Mytherin commented 2 months ago

It looks like DB_HOST might not be configured in the first example?

The MySQL driver uses the same underlying infrastructure as the standard mysqlclient Python package (namely the official MySQL library) - if you can connect using that you should be able to connect using the MySQL extension as well.

fillalph commented 2 months ago

Hi @Mytherin.

Thank you for pointing out that the duckdb_mysql extension uses the mysqlclient Python package.

From my looking into things, mysqlclient doesn't have a way to add a "trusted_connection" parameter to the SQL connection string whereas the pyodbc Python package does.

There is a similar discussion here on the sqlalchemy GitHub discussion page about this connection parameter: https://github.com/sqlalchemy/sqlalchemy/discussions/11425?sort=new.

I will keep looking into this, but right now I don't believe I will be able to use DuckDB with the duckdb_mysql extension to make the connection.

Mytherin commented 2 months ago

Ah, I now see the problem. This extension is a MySQL client - it looks like you are trying to connect to SQL Server (or MSSQL). This is not supported and not expected to work using this extension.