Netflix / iceberg

Iceberg is a table format for large, slow-moving tabular data
Apache License 2.0
472 stars 59 forks source link

Problem inserting data into a table with structs (iceberg-spark) #116

Closed cccs-dm closed 3 years ago

cccs-dm commented 3 years ago

Hello!

It seems there's a bug trying to insert data into a table containing a Spark StructField. I can create a table without any issues:

sc = spark.sparkContext
df = SQLContext(sc).range(0, 1000)
df = df.withColumn("MARK", F.struct(F.lit(7)))
df.write.option("path", outputPath).format("iceberg").saveAsTable(tableName)

But when I try to insert more data in this table, I get an exception:

df.write.mode("overwrite").insertInto(tableName)


AnalysisException Traceback (most recent call last)

in ----> 1 df.write.mode("overwrite").insertInto(tableName) /usr/local/spark/python/pyspark/sql/readwriter.py in insertInto(self, tableName, overwrite) 838 if overwrite is not None: 839 self.mode("overwrite" if overwrite else "append") --> 840 self._jwrite.insertInto(tableName) 841 842 @since(1.4) /usr/local/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args) 1302 1303 answer = self.gateway_client.send_command(command) -> 1304 return_value = get_return_value( 1305 answer, self.gateway_client, self.target_id, self.name) 1306 /usr/local/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 135 # Hide where the exception came from that shows a non-Pythonic 136 # JVM exception message. --> 137 raise_from(converted) 138 else: 139 raise /usr/local/spark/python/pyspark/sql/utils.py in raise_from(e) AnalysisException: unresolved operator 'OverwriteByExpression RelationV2[id#42L, MARK#43] spark_catalog.default.iceberg_spark_bug, true, false;; 'OverwriteByExpression RelationV2[id#42L, MARK#43] spark_catalog.default.iceberg_spark_bug, true, false +- Project [id#31L, struct(col1, 7) AS MARK#33] +- Range (0, 1000, step=1, splits=Some(12))