datacleaner / DataCleaner

The premier open source Data Quality solution
GNU Lesser General Public License v3.0
598 stars 181 forks source link

Use DuckDB as a datastore #1952

Open msimanga opened 1 year ago

msimanga commented 1 year ago

DuckDB is a fairly recent entrant into the embedded database scene. One of it's strengths is you can query parquet files and it is a column store database suited to OLAP workloads.

I was able to create a DuckDB datastore using the DuckDB JDBC driver following instructions for DBeaver here

DataCleader does not pick up the tables in DuckDB database. As far as I can tell I think these are the relevant error lines:

10:40:19.405 [AWT-EventQueue-0] INFO o.a.m.jdbc.JdbcMetadataLoader - No table metadata records returned for schema 'information_schema' 10:40:19.405 [AWT-EventQueue-0] INFO o.a.m.jdbc.JdbcMetadataLoader - No table metadata records returned for schema 'information_schema' 10:40:19.408 [AWT-EventQueue-0] INFO o.a.m.jdbc.JdbcMetadataLoader - No table metadata records returned for schema 'main' 10:40:19.408 [AWT-EventQueue-0] INFO o.a.m.jdbc.JdbcMetadataLoader - No table metadata records returned for schema 'main' 10:40:19.410 [AWT-EventQueue-0] INFO o.a.m.jdbc.JdbcMetadataLoader - No table metadata records returned for schema 'pg_catalog' 10:40:19.410 [AWT-EventQueue-0] INFO o.a.m.jdbc.JdbcMetadataLoader - No table metadata records returned for schema 'pg_catalog'

Anyone have any success using DuckDB?