Closed cbts-alec-johnson closed 1 year ago
Guess this is what you need: https://github.com/apache/hudi/pull/8740/files
Guess this is what you needed: https://github.com/apache/hudi/pull/8740/files
Yes this is what I need. Also, I think you may have labeled this gcp-support instead of aws-support?
@cbts-alec-johnson I need to implement this. Could you please tell you your configuration to sync the comments
@cbts-alec-johnson I need to implement this. Could you please tell you your configuration to sync the comments
@TrustOkoroego I believe that the columns are synced during a table update correctly. However the columns are not synced during table creation since the comment is set to empty like shown above.
Describe the problem you faced
Column comments are not synced to the AWS Glue Data Catalog when setting
hoodie.datasource.hive_sync.sync_comment
totrue
and adding column comments in the dataframe schema metadata.To Reproduce
Steps to reproduce the behavior:
hoodie.datasource.hive_sync.sync_comment
totrue
Expected behavior
Setting
hoodie.datasource.hive_sync.sync_comment
totrue
when the dataframe has column comments should sync the comments to the Glue Catalog.Environment Description
Hudi version : 0.12.1
Spark version : 3.3.0
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no
Additional context
It looks like the comment is manually set to empty in this function here. It should instead get the comment from the dataframe schema metadata.
https://github.com/apache/hudi/blob/c6dadd4cb5d82d4afa9dbfd4b089c02ebe06c14c/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java#L446-L457
Stacktrace