apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.49k stars 2.24k forks source link

Error while connecting to REST catalog using Spark #11477

Open Gowthami03B opened 2 weeks ago

Gowthami03B commented 2 weeks ago

Apache Iceberg version

1.4.3

Query engine

Spark

Please describe the bug 🐞

Spark config and code -

iceberg_rest = {
    "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",
    "spark.sql.catalog.my_catalog": "org.apache.iceberg.spark.SparkCatalog",
    "spark.sql.catalog.my_catalog.type": "rest",
    "spark.sql.catalog.my_catalog.uri": "https://rest.dev.com",
    "spark.sql.catalog.my_catalog.warehouse": "s3a://my-warehouse",
    "spark.sql.catalog.my_catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO",
    "spark.sql.catalog.my_catalog.s3.access-key-id": "XXX",
    "spark.sql.catalog.my_catalog.s3.secret-access-key": "XXX",
}

spark.sql("""CREATE TABLE IF NOT EXISTS my_catalog.datasets.customers2 (
         customer_id INT,
        customer_name STRING,
        date DATE,
        transaction_details STRING) USING iceberg""").show(10)

Error -

SparkConnectGrpcException                 Traceback (most recent call last)
----> 1 spark.sql("""CREATE TABLE IF NOT EXISTS my_catalog.datasets.customers2 (
      2          customer_id INT,
      3         customer_name STRING,
      4         date DATE,
      5         transaction_details STRING) USING iceberg""").show(10)

 in sql(self, sqlQuery, args)
    548     def sql(self, sqlQuery: str, args: Optional[Union[Dict[str, Any], List]] = None) -> "DataFrame":
    549         cmd = SQL(sqlQuery, args)
--> 550         data, properties = self.client.execute_command(cmd.command(self._client))
    551         if "sql_command_result" in properties:
    552             return DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self)

[/layers/com.ds.buildpacks.pip/requirements/lib/python3.11/site-packages/pyspark/sql/connect/client/core.py] in execute_command(self, command)
    980             req.user_context.user_id = self._user_id
    981         req.plan.command.CopyFrom(command)
--> 982         data, _, _, _, properties = self._execute_and_fetch(req)
    983         if data is not None:
    984             return (data.to_pandas(), properties)

[/layers/com.ds.buildpacks.pip/requirements/lib/python3.11/site-packages/pyspark/sql/connect/client/core.py] in _execute_and_fetch(self, req, self_destruct)
   1280         properties: Dict[str, Any] = {}
   1281 
-> 1282         for response in self._execute_and_fetch_as_iterator(req):
   1283             if isinstance(response, StructType):
   1284                 schema = response

[/layers/com.ds.buildpacks.pip/requirements/lib/python3.11/site-packages/pyspark/sql/connect/client/core.py] in _execute_and_fetch_as_iterator(self, req)
   1261                             yield from handle_response(b)
   1262         except Exception as error:
-> 1263             self._handle_error(error)
   1264 
   1265     def _execute_and_fetch(

[/layers/com.ds.buildpacks.pip/requirements/lib/python3.11/site-packages/pyspark/sql/connect/client/core.py] in _handle_error(self, error)
   1500         """
   1501         if isinstance(error, grpc.RpcError):
-> 1502             self._handle_rpc_error(error)
   1503         elif isinstance(error, ValueError):
   1504             if "Cannot invoke RPC" in str(error) and "closed" in str(error):

[/layers/com.ds.buildpacks.pip/requirements/lib/python3.11/site-packages/pyspark/sql/connect/client/core.py] in _handle_rpc_error(self, rpc_error)
   1536                     info = error_details_pb2.ErrorInfo()
   1537                     d.Unpack(info)
-> 1538                     raise convert_exception(info, status.message) from None
   1539 
   1540             raise SparkConnectGrpcException(status.message) from None

SparkConnectGrpcException: (org.apache.iceberg.exceptions.RESTException) Error occurred while processing GET request

Willingness to contribute

nastra commented 2 weeks ago

@Gowthami03B unfortunately the stack trace isn't very helpful. Is there any additional details in the client or server logs that would indicate what exactly went wrong?