LibertyDSNP / parquetjs

Fully asynchronous, pure JavaScript implementation of the Parquet file format with additional features
MIT License
43 stars 24 forks source link

scale for DECIMAL field cannot be 0 #87

Closed JasonYeMSFT closed 1 year ago

JasonYeMSFT commented 1 year ago

Thanks for reporting an issue!

Steps to reproduce

Create a schema where a field specifies type to DECIMAL and scale to 0. Write a row including that field.

Expected behaviour

Works.

Actual behaviour

Fails with this error: Failed to generate test file invalid schema for type: DECIMAL, for Column: decimal, scale is required

Any other comments?

Maybe it's not supported for writing?

Either way, I think this line of code where checking for scale is not taking the value 0 into account. https://github.com/LibertyDSNP/parquetjs/blob/main/lib/schema.ts#L232

According to https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal, "scale must be zero or a positive integer less than or equal to the precision." The library should allow 0 as the scale value. And it seems the scale should be optional as well since the doc specifies a default value 0.

JasonYeMSFT commented 1 year ago

I also noticed that the precision validation logic is not 100% compliant to the spec. For fixed_len_byte_array and binary types of fields, the precision can exceed 18 but the library simply rejects all precision values greater than 18. Could be something to support in the future.

wilwade commented 1 year ago

@JasonYeMSFT I figured that out this morning as well. #81 has that issue once I got down to it.

I added a lot of validation logic and writing DECIMAL (<=18) in #90 if you want to take a look!