databricks / containers

Sample base images for Databricks Container Services
Apache License 2.0
165 stars 116 forks source link

DBR 13.3 missing required package - ImportError: grpcio >= 1.48.1 must be installed #177

Closed helloimowen closed 7 months ago

helloimowen commented 7 months ago

Hi,

We based some of our container images off of the public container build in dockerhub, and some off of an intermediate copy in our own container repository. Today as we created new images off of the dockerhub base image, jobs began to fail. The following error was in the standard error logs.

Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/sql/connect/utils.py", line 45, in require_minimum_grpc_version
    import grpc
ModuleNotFoundError: No module named 'grpc'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/python_shell/scripts/db_ipykernel_launcher.py", line 35, in <module>
    from dbruntime.PipMagicOverrides import PipMagicOverrides
  File "/databricks/python_shell/dbruntime/PipMagicOverrides.py", line 8, in <module>
    from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
  File "/databricks/spark/python/pyspark/sql/connect/session.py", line 19, in <module>
    check_dependencies(__name__)
  File "/databricks/spark/python/pyspark/sql/connect/utils.py", line 35, in check_dependencies
    require_minimum_grpc_version()
  File "/databricks/spark/python/pyspark/sql/connect/utils.py", line 47, in require_minimum_grpc_version
    raise ImportError(
ImportError: grpcio >= 1.48.1 must be installed; however, it was not found.
Traceback (most recent call last):
  File "/databricks/spark/python/pyspark/sql/connect/utils.py", line 45, in require_minimum_grpc_version
    import grpc
ModuleNotFoundError: No module named 'grpc'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/databricks/python_shell/scripts/db_ipykernel_launcher.py", line 35, in <module>
    from dbruntime.PipMagicOverrides import PipMagicOverrides
  File "/databricks/python_shell/dbruntime/PipMagicOverrides.py", line 8, in <module>
    from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
  File "/databricks/spark/python/pyspark/sql/connect/session.py", line 19, in <module>
    check_dependencies(__name__)
  File "/databricks/spark/python/pyspark/sql/connect/utils.py", line 35, in check_dependencies
    require_minimum_grpc_version()
  File "/databricks/spark/python/pyspark/sql/connect/utils.py", line 47, in require_minimum_grpc_version
    raise ImportError(
ImportError: grpcio >= 1.48.1 must be installed; however, it was not found.

However, it seems like a much older PR first removed the lock to grpcio >= 1.48.1:\ https://github.com/databricks/containers/commit/cbcb2ce494d54fdb25d966f51ac07a840a3dd983

We fixed by moving to the internal intermediary image. Wanted to post an issue in case others were seeing similar failures, and to track if this version lock is restored.

xinzhao-db commented 7 months ago

Fixed in https://github.com/databricks/containers/pull/178