adriangb / pgpq

Stream Arrow data into Postgres
MIT License
236 stars 16 forks source link

decimal128 TypeNotSupported #19

Open dev-dsp opened 1 year ago

dev-dsp commented 1 year ago

Hi! My dataset contains decimal128(38, 9) columns.

pgpq fails to initialize the encoder:

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: TypeNotSupported { field: "FIELDNAME", tp: Decimal128(38, 9), msg: "unknown type" }', py/src/lib.rs:35:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/root/migration/load.py", line 217, in <module>
    load_parquet_to_postgres(
  File "/root/migration/load.py", line 181, in load_parquet_to_postgres
    _load_pgpq(tbl_directory, table_name)
  File "/root/migration/load.py", line 165, in _load_pgpq
    encoder = ArrowToPostgresBinaryEncoder(tbl.schema)
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: TypeNotSupported { field: "FIELDNAME", tp: Decimal128(38, 9), msg: "unknown type" }
adriangb commented 1 year ago

The error should be much cleaner (in particular without the panick) but the point still holds: pgpq doesn't support Decimal. I think it's theoretically possible to support by converting to Postgres' NUMERIC type, however, I haven't implemented that yet. I can look into it next time I have some time, but PRs are welcome in the meantime.

A workaround you could use would be to convert it to a Utf8 column before passing into pgpq, copying to a temporary table and then doing an INSERT INTO real_table SELECT thecolumn AS NUMERIC FROM temp_table or something like that.

dev-dsp commented 1 year ago

I see, will try, thank you!

bjornasm commented 1 year ago

Just ran into the samme error >>> encoder = ArrowToPostgresBinaryEncoder(dataset.schema) thread '<unnamed>' panicked at 'calledResult::unwrap()on anErrvalue: TypeNotSupported { field: "length", tp: Decimal128(5, 1), msg: "unknown type" }', py/src/lib.rs:35:10 . I thought there was a near 1-1 relation between decimal and numeric just by looking at the definition. Would be great if you found time to support it, I wish I could conribute but I can't punch my self out of a wet paper bag when it comes to Rust.

adriangb commented 1 year ago

I’ll look into it when I have time, I just can’t promise any timeline.

In the meantime, does the workaround I listed above help you?

bjornasm commented 1 year ago

No problem @adriangb I just changed the source datatype as I had just transformed the inputdata.

nsarfati commented 4 weeks ago

Hey @adriangb ! How are you doing? Thank you very much for sharing this library. I've trying to import a parquet file into a PostgresDB and comparing with the CSV way, and the difference in time is crazy. Congrats!

In the other hand, I also find out these open issue but the suggestion you posted last year doesn't work for me, so I am thinking to implement the Decimal support.

Can you give me some hints / advices on where to start and what to have in mind? I have almost 0 experience with Rust but it's always a good time to learn something new.

Thank you very much in advance!