apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.49k stars 2.24k forks source link

NoSuchMethodError: org.apache.spark.sql.catalyst.trees.Origin while running expire snapshot on Iceberg in Databricks #11534

Closed as-sher closed 1 week ago

as-sher commented 1 week ago

Query engine

Spark 3.5 databricks Runtime 14.3 and 15.3

Question

While trying to run

spark.sql("CALL iceberg_catalog.system.expire_snapshots(table => 'iceberg_catalog.d11_stitch.rewards_third_party_impact_base_query', older_than => TIMESTAMP '2024-03-06 00:00:00.000')")

to this Im getting

Py4JJavaError: An error occurred while calling o415.sql.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.trees.Origin.<init>(Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;Lscala/Option;)V
    at org.apache.spark.sql.catalyst.parser.extensions.IcebergParserUtils$.position(IcebergSqlExtensionsAstBuilder.scala:377)
    at org.apache.spark.sql.catalyst.parser.extensions.IcebergParserUtils$.withOrigin(IcebergSqlExtensionsAstBuilder.scala:367)
    at org.apache.spark.sql.catalyst.parser.extensions.IcebergSqlExtensionsAstBuilder.visitSingleStatement(IcebergSqlExtensionsAstBuilder.scala:334)
    at org.apache.spark.sql.catalyst.parser.extensions.IcebergSqlExtensionsAstBuilder.visitSingleStatement(IcebergSqlExtensionsAstBuilder.scala:66)
    at org.apache.spark.sql.catalyst.parser.extensions.IcebergSqlExtensionsParser$SingleStatementContext.accept(IcebergSqlExtensionsParser.java:153)
    at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
    at org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser.$anonfun$parsePlan$1(IcebergSparkSqlExtensionsParser.scala:124)
    at org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser.parse(IcebergSparkSqlExtensionsParser.scala:178)
    at org.apache.spark.sql.catalyst.parser.extensions.IcebergSparkSqlExtensionsParser.parsePlan(IcebergSparkSqlExtensionsParser.scala:124)
    at com.databricks.sql.parser.DatabricksSqlParser.$anonfun$parsePlan$1(DatabricksSqlParser.scala:80)
    at com.databricks.sql.parser.DatabricksSqlParser.parse(DatabricksSqlParser.scala:101)
    at com.databricks.sql.parser.DatabricksSqlParser.parsePlan(DatabricksSqlParser.scala:77)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:892)
    at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:452)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:891)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1180)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:890)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:924)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
    at py4j.Gateway.invoke(Gateway.java:306)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
    at java.lang.Thread.run(Thread.java:750)
File <command-2355211326393001>, line 1
----> 1 spark.sql("CALL iceberg_catalog.system.expire_snapshots(table => 'iceberg_catalog.d11_stitch.rewards_third_party_impact_base_query',  older_than => TIMESTAMP '2024-03-06 00:00:00.000')")
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

I tried using databricks Runtime 14.3 and 15.3 but getting the same issue on both the runtimes.

I was able to run this successfully on 14.1

Im adding the following jars

iceberg-spark-runtime-3.5_2.12-1.5.0.jar iceberg-aws-bundle-1.5.0.jar

Can anyone help here?

nastra commented 1 week ago

@as-sher I'd suggest to report this to Databricks directly instead of the Iceberg community, since the Iceberg community doesn't control on how Iceberg is working with Databricks

as-sher commented 1 week ago

@nastra I raised this up with databricks and got this as the response :-


Using Iceberg jars in Databricks Runtime (DBR) is not supported, even if it may work for certain versions (e.g. DBR 14.1).
The reason for this is that there are binary incompatibility issues between Iceberg and Spark, which can cause errors.
DBR 14.3 use Spark 3.5.0, but there's an internal change that broke binary compatibility with Iceberg.
 Databricks does not support Iceberg writes in DBR, and this error is an example of the conflict that hasn't been resolved.```

Can you suggest some workaround this?