apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.15k stars 2.14k forks source link

Issue with CALL parsing #8343

Open pulkit-cldcvr opened 1 year ago

pulkit-cldcvr commented 1 year ago

Query engine

Spark

Question

I am trying to use Iceberg glue catalog to integrate with spark. However I am able to query the table data but not able to run procedures.

Exception:- pyspark.sql.utils.ParseException: Syntax error at or near 'CALL'

image

Spark Config:-

image
baptistegh commented 1 year ago

I have the exact same issue for my unit tests under Spark 3.4 / Iceberg 1.3. Everything works well but those CALL calls or ALTER TABLE ... ADD|DROP PARTITION FIELD ... But the spark.sql.extensions is correctly set as described by @pulkit-cldcvr .

manuzhang commented 1 year ago

@pulkit-cldcvr the issue happens only in pyspark or spark-shell as well?

YuvalItzchakov commented 11 months ago

@manuzhang I am experiencing this in pyspark Jupyter notebook using Spark 3.4.1 on EMR Studio workspace.

sundhar010 commented 10 months ago

+1

RussellSpitzer commented 10 months ago

Quick Guess on what might be going wrong, My assumption would be the session being used is not actually loaded with the extensions. I've seen this happen in a few different instances,

  1. (Most Common in General) The Spark Session was already created at the time that "getOrCreate" was called and the extensions are ignored.
  2. (Most Common in Notebooks) The Spark Session is improperly cloned between threads used by the kernel. I've seen this most commonly with kernels using functional libraries (like cats) or something to managing execution. I'm not sure how this happens (but i've seen in sporadically) but I see that sometimes certain cells will be using SparkSession.getActiveSession to execute their SQL and when they do so they end up picking up a session which somehow was cloned without the config set. When directly queried the config will appear set, but when you access the "active session" during some executions it will vanish.
MojoML commented 9 months ago

+1

arvindeybram commented 6 months ago

I am facing the same issue in pyspark - when creating external tables in hive using ICEBERG format.

[PARSE_SYNTAX_ERROR] Syntax error at or near 'ICEBERG'.(line 1, pos 42)

== SQL ==
CREATE EXTERNAL TABLE x (i int) STORED BY ICEBERG;
------------------------------------------^^^
qianzhen0 commented 2 months ago

in my case, adding spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions solve the problem

kennyluke1023 commented 1 month ago

in my case, adding spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions solve the problem

hi bro, I face same problem too butI am running pyspark in colab, how could run this command?

Ravi-una commented 1 day ago

Facing same issues as below while try to use expire snapshots in glue version 4.0, added spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions as well, is there any workaround?

spark.sql("""CALL catalog_name.system.expire_snapshots('db_name.table_name')""")'

pyspark.sql.utils.ParseException: Syntax error at or near 'CALL'

Shekharrajak commented 1 day ago
"spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"

helps.

Ravi-una commented 1 day ago
"spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"

helps.

Yes, I have added all the below config, but still no luck, conf.set("spark.sql.catalog.job_catalog", "org.apache.iceberg.spark.SparkCatalog")

conf.set("spark.sql.catalog.job_catalog.warehouse", args['iceberg_job_catalog_warehouse']) conf.set("spark.sql.catalog.job_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") conf.set("spark.sql.catalog.job_catalog.type", "glue") conf.set("spark.sql.catalog.job_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO") conf.set("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") conf.set("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")