Open amalgaonkar opened 1 year ago
As a work around following helped :
Glue's spark version was not being used. Had to explicitly add it as a python environment variable. and then imported additional pydeequ packages.
import os
os.environ["SPARK_VERSION"] = "3.3"
from pydeequ.analyzers import *
from pydeequ.anomaly_detection import *
As a work around following helped :
Glue's spark version was not being used. Had to explicitly add it as a python environment variable. and then imported additional pydeequ packages.
import os os.environ["SPARK_VERSION"] = "3.3" from pydeequ.analyzers import * from pydeequ.anomaly_detection import *
I had the same problem on Databricks, thanks for the workaround!
Describe the bug While following this Tutorial: https://github.com/awslabs/python-deequ/blob/master/tutorials/anomaly_detection.ipynb Error:
Same error for SimpleThresholdStrategy,RelativeRateOfChangeStrategy etc.
To Reproduce Steps to reproduce the behavior:
Create a Glue Jobs with same code as per the Tutorial : https://github.com/awslabs/python-deequ/blob/master/tutorials/anomaly_detection.ipynb
Except, before importing pydeequ create python environment variable :
Use Glue Version 4.0 . Spark 3.3
To include Pydeequ module create a setuup.py as shown below :
Copy the .whl file to s3 location and refer it as additional python libraries in Glue job. Reference : https://repost.aws/knowledge-center/glue-import-error-no-module-named
Expected behavior The job should succeed identifying anomaly.