aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.9k stars 696 forks source link

Add table_type property to create_ctas_table to support iceberg tables #2959

Open Samreay opened 3 weeks ago

Samreay commented 3 weeks ago

Is your idea related to a problem? Please describe. Iceberg has a lot of nice features we'd like to use, and AWS create table as supports it now out of the box. It'd be great if the create_ctas_table allowed us to specify the table type to be either hive or iceberg as per the spec linked before.

Describe the solution you'd like create_ctas_table has a new kwarg, table_type. It would also change external_location to location based on Hive vs Iceberg table type, in addition to changing partitioned_by to partitioning

Alternatively, the function could be made more general by adding a kwarg additional_table_properties which allows end users to do what they want instead of restricting things to a subset of keywords.

I'd be happy to submit a PR for this if the concept sounds good.

jaidisido commented 2 weeks ago

This makes sense to me and a contribution would be welcome @Samreay