apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.45k stars 2.23k forks source link

Apache Iceberg - Branch cannot be merged using the fast_forward procedure #9553

Open Ashwin07 opened 9 months ago

Ashwin07 commented 9 months ago

I have been trying to test branching feature in Apache Iceberg 1.4.3, but facing the below issue with procedure call fast_forward, Can you please let me know if this an existing limitation or something I am missing. Apache Iceberg = 1.4.3 Spark = 3.3.2 catalog name = silver_layer namespace = iceberg_poc table name = TEST_BRANCHING

spark.sql("""call silver_layer.system.fast_forward('silver_layer.iceberg_poc.TEST_BRANCHING', 'main', 'audit-branch')""").show() Traceback (most recent call last): File "", line 1, in File "/opt/cloudera/parcels/SPARK3-3.3.2.3.3.7190.0-91-1.p0.45265883/lib/spark3/python/pyspark/sql/session.py", line 1034, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self) File "/opt/cloudera/parcels/SPARK3-3.3.2.3.3.7190.0-91-1.p0.45265883/lib/spark3/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in call File "/opt/cloudera/parcels/SPARK3-3.3.2.3.3.7190.0-91-1.p0.45265883/lib/spark3/python/pyspark/sql/utils.py", line 196, in deco raise converted from None pyspark.sql.utils.AnalysisException: Procedure system.fast_forward not found

nastra commented 9 months ago

@Ashwin07 can you please share your full catalog configuration? It seems you might be missing https://iceberg.apache.org/docs/latest/spark-configuration/#sql-extensions

Ashwin07 commented 9 months ago

Here is my pyspark session command

pyspark3 --conf spark.sql.catalog.gold_layer.uri=XXXXXXXXX --conf spark.sql.catalog.gold_layer.ref=main --conf spark.sql.catalog.gold_layer.catalog-impl=org.apache.iceberg.nessie.NessieCatalog --conf spark.sql.catalog.gold_layer.warehouse=s3a://gold-layer-bkt/nessie_poc/metadata --conf spark.sql.catalog.gold_layer=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.bronze_layer.uri=XXXXXXXXX --conf spark.sql.catalog.bronze_layer.ref=main --conf spark.sql.catalog.bronze_layer.catalog-impl=org.apache.iceberg.nessie.NessieCatalog --conf spark.sql.catalog.bronze_layer.warehouse=s3a://bronze-layer-bkt --conf spark.sql.catalog.bronze_layer=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.silver_layer.uri=XXXXXXXXX --conf spark.sql.catalog.silver_layer.ref=main --conf spark.sql.catalog.silver_layer.catalog-impl=org.apache.iceberg.nessie.NessieCatalog --conf spark.sql.catalog.silver_layer.warehouse=s3a://silver-layer-bkt/nessie_poc/metadata --conf spark.sql.catalog.silver_layer=org.apache.iceberg.spark.SparkCatalog --conf spark.hadoop.fs.s3a.access.key=XXXXXXXXXX --conf spark.hadoop.fs.s3a.endpoint=XXXXXXXXXX --conf spark.hadoop.fs.s3a.connection.ssl.enabled=false --conf spark.hadoop.fs.s3a.secret.key=XXXXXXXXXXXXXXX --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions --packages org.apache.hadoop:hadoop-aws:3.2.0,org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.4.3,org.projectnessie.nessie-integrations:nessie-spark-extensions-3.3_2.12:0.75.0 --repositories XXXXXXXXXXXXXX

nastra commented 9 months ago

I'm surprised that the procedure can't be found, given that you have defined org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions. Do any of the other procedures work in https://iceberg.apache.org/docs/latest/spark-procedures/#snapshot-management?

Ashwin07 commented 9 months ago

I have tried Expire snapshot but it seems to throw different error, at least I did not get a blanket statement like the procedure cannot be found. [https://github.com/apache/iceberg/issues/9562]

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.