Closed anentropic closed 4 days ago
Thank you for considering this feature. We believe it is highly valuable and essential. We’ll begin preparing the necessary code and documentation right away, so please stay tuned for updates.
Awesome, thanks!
I found that using psql
client instead allows me to see the table I previously created via Python+duckdb:
mysql=> \d
List of relations
Schema | Name | Type | Owner
--------------------+---------------------+-------+----------
main | binlog_position | table | postgres
performance_schema | global_status | table | postgres
main | persistent_variable | table | postgres
main | duck_table | table | postgres
(4 rows)
and the main
schema here matches what I saw on Python+duckdb side.
Furthermore, if I create a new schema and table via psql
I can see it as a 'database' in mysql
, I can then use myschema
and query the table.
However this doesn't translate through to duckdb interface:
In [1]: import duckdb
In [2]: conn = duckdb.connect("docker/duckdb/mysql.db")
In [4]: conn.execute("select current_catalog()").fetchall()
Out[4]: [('mysql',)]
In [6]: conn.execute("select current_schemas(true)").fetchall()
Out[6]: [(['main', 'main', 'main', 'pg_catalog'],)]
In [7]: conn.execute("select current_schema()").fetchall()
Out[7]: [('main',)]
In [9]: conn.execute("select current_schemas(false)").fetchall()
Out[9]: [([],)]
Additionally, in mysql
I can use main
(even though it is not listed if I query show databases;
) and create a table that is visible to psql
:
mysql> use main;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> create table dummy2 (id int);
Query OK, 0 rows affected (0.06 sec)
mysql> insert into dummy2 (id) values (123);
Query OK, 1 row affected (0.07 sec)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| myschema |
| information_schema |
| mysql |
| performance_schema |
+--------------------+
4 rows in set (0.02 sec)
mysql=> \d
List of relations
Schema | Name | Type | Owner
--------------------+---------------------+-------+----------
main | binlog_position | table | postgres
myschema | dummy | table | postgres
main | dummy2 | table | postgres
performance_schema | global_status | table | postgres
main | persistent_variable | table | postgres
(5 rows)
However this still isn't visible to duckdb:
In [29]: conn = duckdb.connect("docker/duckdb/mysql.db")
In [30]: conn.execute("show tables").fetchall()
Out[30]: []
In [31]: conn.execute("select current_schema()").fetchall()
Out[31]: [('main',)]
So currently I can:
Thank you for your thoughtful reply!
Additionally, in mysql I can use main (even though it is not listed if I query show databases;) and create a table that is visible to psql:
The main database in DuckDB currently has a visibility issue when accessed through a MySQL client. I’m actively working on a fix and will update you here once it’s resolved.
add data via Python+duckdb that is visible to Postgres but not MySQL add data via MySQL that is visible to Postgres but not Python+duckdb add data via Postgres that is visible to MySQL but not Python+duckdb
These test cases are very thorough. We’ll add them to CI testing soon. Your detailed feedback is much appreciated!
I found a way to load Parquet data that is visible to MySQL:
psql
make a query like CREATE TABLE myschema.duck_table AS SELECT * FROM 'snapshots/*.parquet'
using the local path to the mounted volume within the container (i.e. query is telling the server where to load the files from, rather than sending the files from the host machine)
Loaded data is not visible to Python+duckdb, as noted previously, but using this method I can bootstrap the myduckserver and then query it via MySQL client as intended.
@anentropic We’ve addressed some visible issues with the main database and added a README in PR #154 to assist with your requirements. Please give it a try and let us know if it meets your expectations. Thank you!
Super interested in the project, looks like exactly what I need!
I wanted to test it out locally by bootstrapping the db from a bunch of parquet files I downloaded from an RDS snapshot (I had previously tested out raw duckdb by the same method, it's easy to load these files)
I connected a volume to the
/home/admin/data
dir and ran the containerLocally on my host machine I was able to see the
mysql.db
file coming from the volume and load a parquet table into it, via Pythonduckdb
.If I
docker exec
into the container I can see themysql.db
file has the same size (0.5 GB) when viewed from inside and outside. From Python side I can connect to the file and query the data I loaded.When I connect to the container via
mysql
cli client I do not get expected results.So the
use
db command behaves a bit strange. But also I can't find any of the data I loaded via Python.So I've created a table in the
main
schema as far as duckdb is concerned, but neither of these things seems to exist . I also tried creating a new schema from duckdb, but it is not visible in mysql client either.Coming at things from the other side, I did a
create database
from mysql client... but how do I then select that as a catalog/schema to work with from duckdb side?(I realise I have bypassed the intended way of using the system, but if there's any pointers you can give in this direction it would help me to evaluate it... I just want to bootstrap it via duckdb interface and then query it via mysql)