aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.7k stars 3.93k forks source link

aws_glue : CfnCrawler missing data source HudiTarget both in CFN and CDK. HudiTarget only available in GUI #31133

Open lorenzo-necto opened 3 months ago

lorenzo-necto commented 3 months ago

Describe the feature

Hello If I set up my crawler from GUI console I can choose Hudi S3 Table as a data source to be crawled. This is not yet possible neither on CFN or CDK. CDK already has iceberg and delta lake as sources for the crawler but no Hudi : https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_glue.CfnCrawler.html

Use Case

I would like to use the serde Hudi and not parquet when crawling my hudi S3 folders, although available via GUI it is still not present in CDK or CFN

Proposed Solution

P1. Add at least to CloudFormation HudiTarget to complete the data lake frameworks formats (given that Iceberg and Delta Target are present in CFN)

P2. Add to CDK the target type

Other Information

No response

Acknowledgements

CDK version used

"aws-cdk-lib": "^2.115.0",

Environment details (OS name and version, etc.)

MACBookPro M1

pahud commented 3 months ago

Add at least to CloudFormation HudiTarget to complete the data lake frameworks formats (given that Iceberg and Delta Target are present in CFN)

Looks like it's still missing in CloudFormation? Pleae submit the feature request to cloudformation-coverage-roadmap to help the CFN team prioritize this support. I am making this a p2 feat request pending on CFN support. As soon as we have that support, CDK should be able to support that immediately.