Open theferrit32 opened 1 month ago
Semi-related, this was just posted today:
https://news.ycombinator.com/item?id=41871068
pg_parquet is a PostgreSQL extension that allows you to read and write Parquet files, which are located in S3 or file system, from PostgreSQL via COPY TO/FROM commands
https://www.crunchydata.com/blog/pg_parquet-an-extension-to-connect-postgres-and-parquet
Submitter Name
Kyle Ferriter
Submitter Affiliation
Broad Institute
Project Details
We would like to load GKS data into a relational model for rich querying. We currently heavily use BigQuery in backend processes but would like something free and portable that can be distributed to users. Anyvar uses Postgresql with most data stored in JSON columns. We would like something easier to configure and run than Postgres, such as embedded, single file/directory database like SQLite, Rocksdb, Duckdb. We have looked a little bit at DuckDB, which can import/export Parquet and NDJSON files and can provide a SQL query interface over them with basically no configuration needed. A difficulty with NDJSON import is that the automatic schema inference fails when the data contains heterogeneous rows, which is the case for GKS data.
We are aiming for these deliverables:
Stretch:
Required Skills