cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.2k stars 3.82k forks source link

sql: support COPY TO ... BINARY #97180

Open rafiss opened 1 year ago

rafiss commented 1 year ago

See https://www.postgresql.org/docs/current/sql-copy.html

The binary format is described in detail there. We already support this for COPY FROM.

The binary format option causes all data to be stored/read as binary format rather than as text. It is somewhat faster than the text and CSV formats, but a binary-format file is less portable across machine architectures and PostgreSQL versions. Also, the binary format is very data type specific; for example it will not work to output binary data from a smallint column and read it into an integer column, even though that would work fine in text format.

The binary file format consists of a file header, zero or more tuples containing the row data, and a file trailer. Headers and data are in network byte order.

Jira issue: CRDB-24562

Epic CRDB-18320

mikeyk commented 1 year ago

hi @rafiss, we're looking into ways of ingesting large amounts of Cockroach-stored data into Arrow, and pgeon (https://github.com/0x0L/pgeon) is the most promising we've found. It uses COPY TO ... BINARY to export the data from PG-like databases, so I believe that's the only thing standing in the way of it being used with Cockroach. Just wanted to share a use-case that maps directly to this issue, and if we can be helpful in testing any implementation please let us know.

kangzhang commented 1 year ago

Ran into similar issue with ADBC - https://arrow.apache.org/adbc/main/format/specification.html It also uses "COPY TO ... BINARY" syntax.

hansihe commented 7 months ago

Hitting the same issue when using DuckDB with Cockroach