apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.52k stars 2.25k forks source link

how to update nested column value with spark #10557

Open panda403 opened 5 months ago

panda403 commented 5 months ago

Query engine

use spark with iceberg

Question

here is my iceberg table root |-- partition_time: long (nullable = true) |-- base: struct (nullable = true) | |-- report_time_ms: long (nullable = true) | |-- dev_info: struct (nullable = true) | | |-- env: string (nullable = true) | | |-- user: string (nullable = true) | | |-- pwd: string (nullable = true) | |-- com_info: struct (nullable = true) ........... i want to update colum base.dev_info.env's value to 'prod',when i use df.withColumn("new_env", F.when(F.col("type") === 1, "").otherwise(F.col("base.dev_info.env"))) the new_env's value is correct but when i use df.withColumn("base.dev_info.env", F.when(F.col("type") === 1, "").otherwise(F.col("base.dev_info.env"))) , the base.dev_info.env has no change it seems spark cant identify the nested column, how to update nested column value with spark?

pvij commented 1 month ago

@panda403 The question seems to be Spark-specific and is not related to iceberg. You can use the withField method for a Spark dataframe column (https://spark.apache.org/docs/3.5.2/api/java/org/apache/spark/sql/Column.html#withField-java.lang.String-org.apache.spark.sql.Column-) to update nested values.