Open guyernest opened 1 year ago
Can you point to the Glue docs (or CloudFormation docs for the Glue CFN) where these are described?
Thank you @indrora for your attention.
If you check the TableInput in Glue CFN, you can see that it has Parameters
and StorageDescriptor
.
The StorageDescriptor
CFN also has a Parameters
section.
This is the source of the confusion as some of the parameters should go to the TableInput
section and some to the StorageDescriptor
.
Here is another link to the specific parameters that are needed for Athena: https://docs.aws.amazon.com/athena/latest/ug/partition-projection-setting-up.html
As described above about the possible options to solve it, we can add a general option to add parameters to the TableInput in Glue or to make it more specific for the parameters that are defined for the projection for Athena.
I ran into this when I tried to add the skip.header.line.count
property to the table using
storage_parameters=[glue.StorageParameter.skip_header_line_count(1)]
As you showed it was written into the wrong parameter section.
After copying it to the correct place in the template and deploying it manually, the table property was correctly configured as expected.
Thank you for fixing it.
Describe the bug
The
TableInput
section in the GlueAWS::Glue::Table
has two differentParameters
sections, one for the storage and one for the table. The current implementation of the S3-Table puts all the custom parameters into theStorageDescriptor
sectionParameters
and leaves the other hard-coded.The use case is for dynamic-partitioning, which uses
projection.<dynamic-partitioning>.format
and similar parameters to define the way that Glue (and Athena) will parse the dynamic partitioning field. This is a common way to archive data into S3 using Kinesis Firehose.Expected Behavior
When using the following code in the CDK:
I expect to get the following CFN snippet:
Please note that the parameters are under the Table Input.
Current Behavior
Instead I get the following stack Snippet:
Please note that the dynamic partitioning parameters are added to the wrong
parameters
section.Reproduction Steps
Use a similar code in your stack definition under /lib:
Possible Solution
I can think of three options to solve the bug:
parameters
to thetableInput
and not only to thestorageParameters
- something liketableParameters
.tableInput
object.Additional Information/Context
As mentioned above, this is part of common pipeline of replication from a DynamoDB table to S3 to allow analytical queries on that data from Athena. In the example above (
extendedS3DestinationConfiguration
) the user can define the format of the dynamic partitioning of the data in Firehose. If we fix this issue with a similar focused method (option 3 above), it will be easy to extend constructs such asKinesisStreamsToKinesisFirehoseToS3
,AwsDynamoDBKinesisStreamsS3
orKinesisFirehoseToS3
to support the creation of the Glue table on top of the data in S3.CDK CLI Version
2.99.0 (build 0aa1096)
Framework Version
No response
Node.js Version
v16.18.1
OS
MacOS
Language
Typescript
Language Version
No response
Other information
No response