awslabs / python-deequ

Python API for Deequ
Apache License 2.0
713 stars 134 forks source link

When we are installing this it is working fine but import is not working fine #113

Open aviral-bhardwaj opened 1 year ago

aviral-bhardwaj commented 1 year ago

Describe the bug

When we are installing this library in the cluster it is installing successfully without any issue

image

but when we are importing it it is showing error

IndexError: list index out of range

Full error log is

IndexError Traceback (most recent call last)

in ----> 1 import pydeequ /databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py in import_patch(name, globals, locals, fromlist, level) 165 # Import the desired module. If you’re seeing this while debugging a failed import, 166 # look at preceding stack frames for relevant error information. --> 167 original_result = python_builtin_import(name, globals, locals, fromlist, level) 168 169 is_root_import = thread_local._nest_level == 1 /local_disk0/.ephemeral_nfs/envs/pythonEnv-0fc4342d-3184-408f-8a68-6ece7de367e9/lib/python3.8/site-packages/pydeequ/__init__.py in 19 from pydeequ.analyzers import AnalysisRunner 20 from pydeequ.checks import Check, CheckLevel ---> 21 from pydeequ.configs import DEEQU_MAVEN_COORD 22 from pydeequ.profiles import ColumnProfilerRunner 23 /databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py in import_patch(name, globals, locals, fromlist, level) 165 # Import the desired module. If you’re seeing this while debugging a failed import, 166 # look at preceding stack frames for relevant error information. --> 167 original_result = python_builtin_import(name, globals, locals, fromlist, level) 168 169 is_root_import = thread_local._nest_level == 1 /local_disk0/.ephemeral_nfs/envs/pythonEnv-0fc4342d-3184-408f-8a68-6ece7de367e9/lib/python3.8/site-packages/pydeequ/configs.py in 35 36 ---> 37 DEEQU_MAVEN_COORD = _get_deequ_maven_config() 38 IS_DEEQU_V1 = re.search("com\.amazon\.deequ\:deequ\:1.*", DEEQU_MAVEN_COORD) is not None /local_disk0/.ephemeral_nfs/envs/pythonEnv-0fc4342d-3184-408f-8a68-6ece7de367e9/lib/python3.8/site-packages/pydeequ/configs.py in _get_deequ_maven_config() 26 27 def _get_deequ_maven_config(): ---> 28 spark_version = _get_spark_version() 29 try: 30 return SPARK_TO_DEEQU_COORD_MAPPING[spark_version[:3]] /local_disk0/.ephemeral_nfs/envs/pythonEnv-0fc4342d-3184-408f-8a68-6ece7de367e9/lib/python3.8/site-packages/pydeequ/configs.py in _get_spark_version() 21 ] 22 output = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE) ---> 23 spark_version = output.stdout.decode().split("\n")[-2] 24 return spark_version 25 IndexError: list index out of range
chenliu0831 commented 1 year ago

@aviral-bhardwaj we changed the way how to use this library, now SPARK_VERSION env var is required. see https://github.com/awslabs/python-deequ/pull/114.