Open aseigo opened 19 hours ago
Idle wondering: is this because the names do not match? Is this another symptom of #39 ?
Field names seem correct but COPY FROM
is currently not interop well (requires exact match) with parquet files written by other tools. Hopefully, this PR should resolve it by allowing more relaxed schema match.
I also plan to introduce match_by_position
option for COPY FROM
to resolve #39.
EDIT: looks like field name also mismatch for the list element. So yes, another symptom of #39.
looks like field name also mismatch for the list element. So yes, another symptom of #39.
To make it even more tricky, I'm not even sure how to make the field name for the list element match. If the field in the table is changed from fsq_category_ids
to element
, then the error becomes that there is no element
field in the parquet file (which is correct, of course!).
It is nice to match on name where possible ... perhaps a smaller useful change here would be to just ignore the name of the field on list elements, as they aren't nameable (afaik, anyways) in postgresql.
p.s. thank-you for this tool. Being able to drag data into pgsql from parquet files is very snazzy.
thanks for the feedback, hopefully we will improve COPY FROM
experience in a few weeks. There will be a few more PRs coming after the #39. The issue will also be fixed.
you can try checkout to #39 btw. It has good chance to resolve the element name mismatch (cast will allow it), which will merged the next week.
Using the file at
s3://fsq-os-places-us-east-1/release/dt=2024-11-19/places/parquet/places-00000.snappy.parquet
(part of https://opensource.foursquare.com/os-places/ .. ~434MB in size), pg_parquet shows the following schema:and creating a table in postgresql as:
then attempting to copy the file into the table with:
results in the following error:
I also tried with a custom type such as
CREATE TYPE element AS (elment text[]);
but that ends up creating a list of records, which also does not match the parquet file.I'm not sure if I'm doing something wrong here (the documentation is a bit light on how arrays are intended to work?), or if this is a bug in
pg_parquet
.