Closed parisni closed 2 years ago
@parisni this is a known issue. We did not see a strong use case to add comments. May I know your usecase. Perhaps we can take it up in a future release.
our use case is improve quality of our lakehouse. Hudi tables are often accessible to end users (they allow to apply GDPR treatment) and the column/tables comments is a neat way to improve data analysts quality and user experience. Also our upstream data source sometimes do have comments (parquet metadata / hive metastore regular comments) and when transformed into hudi, that information is lost.
would this work for you https://github.com/apache/hudi/pull/4960 or are you looking for something else?
Indeed, this is exactly what I am looking for ! thanks
On Tue, 2022-04-26 at 19:33 -0700, Sivabalan Narayanan wrote:
would this work for you https://github.com/apache/hudi/pull/4960 or are you looking for something else?
@parisni Glad to know that. Closing this issue. Let us know if you have additional questions.
when a spark schema has a metadata with a comment field, then the spark writer propagates the comment into the metastore.
Then other metastore client (hive, presto) can describe the table and get comments.
It turns out hudi does not support them: when such comment is added to the schema, the resulting table don't get the comment.
Digging the source code, the schema comes either from the hudi commit metadata in avro format or by reading the last parquet file. However the initial comment is not present in both.