LibertyDSNP / parquetjs

Fully asynchronous, pure JavaScript implementation of the Parquet file format with additional features
MIT License
43 stars 24 forks source link

Decimal Support for Binary Precision #91

Open wilwade opened 1 year ago

wilwade commented 1 year ago

Currently this library only supports DECIMAL reading and writing when the precision is <= 18

To annotate the Parquet Spec: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

DECIMAL can be used to annotate the following types:

  • [x] int32: for 1 <= precision <= 9
  • [x] int64: for 1 <= precision <= 18; precision < 10 will produce a warning
  • [x] fixed_len_byte_array: precision is limited by the array size. Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits
  • [ ] binary: precision is not limited, but is required. The minimum number of bytes to store the unscaled value should be used.

Test Files:

Related Issues:

YECHUNAN commented 10 months ago

I made a PR attempting to add rudimentary support for Decimal fields that are represented by byte arrays, which may have precision over 18.

craxal commented 3 months ago

I suspect that the earlier pull request has caused some regression issues related to DECIMAL values. Some folks are reporting the following error:

missing option: typeLength (required for FIXED_LEN_BYTE_ARRAY)

From what I can gather, this occurs even if there are no FIXED_LEN_BYTE_ARRAY backed DECIMAL values (only INT64 in one case).

wilwade commented 3 months ago

@craxal the fix from @JasonYeMSFT released in v1.6.1 (just this morning) should fix it.

craxal commented 3 months ago

@wilwade Ah, yes, I think it does. Just tested it myself. Sorry, I thought the pull request had already been released.