Closed Shubham-Jha-GT closed 1 year ago
Updated the config to this (based on iceberg table configuration):
spark = SparkSession.builder.config("spark.driver.memory", "25g").config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog").config("spark.sql.catalog.spark_catalog.type", "hive").appName(app_name).getOrCreate()
I'm getting this new error -
An error occurred while calling o87.sql. Cannot find catalog plugin class for catalog 'spark_catalog': org.apache.iceberg.spark.SparkSessionCatalog
Updated the config to this (based on iceberg table configuration):
spark = SparkSession.builder.config("spark.driver.memory", "25g").config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog").config("spark.sql.catalog.spark_catalog.type", "hive").appName(app_name).getOrCreate()
I'm getting this new error -
An error occurred while calling o87.sql. Cannot find catalog plugin class for catalog 'spark_catalog': org.apache.iceberg.spark.SparkSessionCatalog
That error seems to be due to not having the dependencies for iceberg. Have you configured your glue job with the iceberg connector from marketplace? Here you can find how https://aws.amazon.com/es/blogs/big-data/implement-a-cdc-based-upsert-in-a-data-lake-using-apache-iceberg-and-aws-glue/
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
I'm trying to read data from an iceberg table, the data is in ORC format and partitioned by column. I'm getting this error -
This is my code :
spark = SparkSession.builder.config("spark.driver.memory", "25g").appName(app_name).getOrCreate()
temp_tag_thrshld_data = spark.sql("SELECT * FROM dev_db.temp_tag_thrshld_iceberg")
If I replace my
spark.sql("Select * from a_normal_athena_table)
the code runs fine. I'm also not able to read the data directly from S3 as its an ORC format with Snappy compression so I don't get any results (I'm probably missing the correct framework to read S3 ORC directly but that's another issue for another day)I've tried validating my table using
aws glue get-table --database-name dev_db --name temp_tag_thrshld_iceberg
and this is the output I got -