Open vmarchaud opened 3 months ago
First off, thanks for the test and including the file!
I would expect those to come out as buffers as well right now. They are FIXED_LEN_BYTE_ARRAY
under the hood.
The other output option would be strings as JS only supports up to 53 bit numbers.
Looks like the issue is because this file uses a dictionary and dictionaries get a "toString" (wrongly) applied: https://github.com/LibertyDSNP/parquetjs/blame/91fc71f262c699fdb5be50df2e0b18da8acf8e19/lib/reader.ts#L948
However removing that looks like it causes some other tests to fail, so some version of that is needed for some values.
All the failing tests however are in the test-files.js
test, so perhaps some of them are wrong? I might be able to take a deeper look in a few weeks, but perhaps that is enough that you can find the deeper issue faster than I will be able to.
Thanks for reporting an issue!
Steps to reproduce
Schema:
File: e2e_datasources.bigquery_test_c40ff3c5-03f4-4213-9d52-fc62e71af0ed_1710089629994_file-000000000000.parquet.gz
Expected behaviour
We should decode age as number or at least as a buffer
Actual behaviour
Any other comments?
I'm not familiar with parquet encodings, actually started working with it this afternoon so i might be doing something wrong. I would expect to have the number decoded from decimal however i've seen in other tests that since decimal are encoded as a FIXED_LEN_BYTE_ARRAY in my case it should be decoded as a buffer but that's not the case either.