awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
637 stars 299 forks source link

decimal type losing precision after select_fields transformation #5

Open bakulinav opened 6 years ago

bakulinav commented 6 years ago

I have an RDS table with decimal(10,4) field. This tables uses like source for ETL. But select_fields transformation drops precision to decimal(5, 4) in following flow:

...
>>> datasource = glueContext.create_dynamic_frame.from_catalog(database = "<rds-source>", table_name = "<table>")
>>> datasource.schema
StructType([..., Field(Bounty, DecimalType(10, 4, {}), {}), ...], {})

>>> mapping = datasource.apply_mapping([..., ("bounty", "decimal(10,4)", "bounty", "decimal(10,4)"), ...])
>>> mapping.schema
StructType([..., Field(bounty, DecimalType(10, 4, {}), {}), ...], {})

>>> selected = mapping.select_fields([..., "bounty", ...])
>>> selected.schema
StructType([..., Field(bounty, DecimalType(5, 4, {}), {}), ...], {})
...
bsowell commented 6 years ago

Thanks for reporting this. We will investigate.

venatir commented 3 years ago

Did you investigate?

fabriz-io commented 1 year ago

Took us some time to figure out that precision is changing in a "select_fields" operation. Do you think there will be an update on this issue? Would be nice to know for our future planning.