awslabs / python-deequ

Python API for Deequ
Apache License 2.0
713 stars 134 forks source link

TestSuggestions test class not passing #84

Open roberthheise opened 2 years ago

roberthheise commented 2 years ago

Describe the bug When importing the test_suggestions.py class, each test function is failing

To Reproduce Steps to reproduce the behavior:

  1. Import the test_suggestions.py class
  2. Run the tests
  3. See error

Expected behavior The tests should pass

Desktop (please complete the following information):

Additional context /Users/rheise/PycharmProjects/validation/venv/bin/python "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pycharm/_jb_pytest_runner.py" --target test_suggestions.py::TestSuggestions Testing started at 9:43 AM ... Launching pytest with arguments test_suggestions.py::TestSuggestions --no-header --no-summary -q in /Users/rheise/PycharmProjects/validation

============================= test session starts ============================== collecting ... collected 8 items

test_suggestions.py::TestSuggestions::test_CategoricalRangeRule :: loading settings :: url = jar:file:/Users/rheise/PycharmProjects/validation/venv/lib/python3.8/site-packages/pyspark/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/rheise/PycharmProjects/validation/venv/lib/python3.8/site-packages/pyspark/jars/spark-unsafe_2.12-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release Ivy Default Cache set to: /Users/rheise/.ivy2/cache The jars for the packages stored in: /Users/rheise/.ivy2/jars com.amazon.deequ#deequ added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-20aa5f03-82aa-42b6-9e39-e6049dc4523b;1.0 confs: [default] found com.amazon.deequ#deequ;1.2.2-spark-3.0 in central found org.scalanlp#breeze_2.12;0.13.2 in central found org.scalanlp#breeze-macros_2.12;0.13.2 in central found org.scala-lang#scala-reflect;2.12.1 in central found com.github.fommil.netlib#core;1.1.2 in central found net.sf.opencsv#opencsv;2.3 in central found com.github.rwl#jtransforms;2.4.0 in central found junit#junit;4.8.2 in central found org.apache.commons#commons-math3;3.2 in central found org.spire-math#spire_2.12;0.13.0 in central found org.spire-math#spire-macros_2.12;0.13.0 in central found org.typelevel#machinist_2.12;0.6.1 in central found com.chuusai#shapeless_2.12;2.3.2 in central found org.typelevel#macro-compat_2.12;1.1.1 in central found org.slf4j#slf4j-api;1.7.5 in central :: resolution report :: resolve 318ms :: artifacts dl 17ms :: modules in use: com.amazon.deequ#deequ;1.2.2-spark-3.0 from central in [default] com.chuusai#shapeless_2.12;2.3.2 from central in [default] com.github.fommil.netlib#core;1.1.2 from central in [default] com.github.rwl#jtransforms;2.4.0 from central in [default] junit#junit;4.8.2 from central in [default] net.sf.opencsv#opencsv;2.3 from central in [default] org.apache.commons#commons-math3;3.2 from central in [default] org.scala-lang#scala-reflect;2.12.1 from central in [default] org.scalanlp#breeze-macros_2.12;0.13.2 from central in [default] org.scalanlp#breeze_2.12;0.13.2 from central in [default] org.slf4j#slf4j-api;1.7.5 from central in [default] org.spire-math#spire-macros_2.12;0.13.0 from central in [default] org.spire-math#spire_2.12;0.13.0 from central in [default] org.typelevel#machinist_2.12;0.6.1 from central in [default] org.typelevel#macro-compat_2.12;1.1.1 from central in [default] :: evicted modules: org.scala-lang#scala-reflect;2.12.0 by [org.scala-lang#scala-reflect;2.12.1] in [default]

|                  |            modules            ||   artifacts   |
|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
|      default     |   16  |   0   |   0   |   1   ||   15  |   0   |
---------------------------------------------------------------------

:: retrieving :: org.apache.spark#spark-submit-parent-20aa5f03-82aa-42b6-9e39-e6049dc4523b confs: [default] 0 artifacts copied, 15 already retrieved (0kB/8ms) 21/11/10 09:43:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). FAILED [ 12%] test_suggestions.py:32 (TestSuggestions.test_CategoricalRangeRule) self =

def test_CategoricalRangeRule(self):
  result = self.ConstraintSuggestionRunner.onData(self.df).addConstraintRule(CategoricalRangeRule()).run()

test_suggestions.py:34:


venv/lib/python3.8/site-packages/pydeequ/suggestions.py:81: in run result = self._ConstraintSuggestionRunBuilder.run() venv/lib/python3.8/site-packages/py4j/java_gateway.py:1309: in call return_value = get_return_value( venv/lib/python3.8/site-packages/pyspark/sql/utils.py:111: in deco return f(*a, **kw)


answer = 'xro102' gateway_client = <py4j.clientserver.JavaClient object at 0x1212e3a60> target_id = 'o99', name = 'run'

def get_return_value(answer, gateway_client, target_id=None, name=None):
    """Converts an answer received from the Java gateway into a Python object.

    For example, string representation of integers are converted to Python
    integer, string representation of objects are converted to JavaObject
    instances, etc.

    :param answer: the string returned by the Java gateway
    :param gateway_client: the gateway client used to communicate with the Java
        Gateway. Only necessary if the answer is a reference (e.g., object,
        list, map)
    :param target_id: the name of the object from which the answer comes from
        (e.g., *object1* in `object1.hello()`). Optional.
    :param name: the name of the member from which the answer comes from
        (e.g., *hello* in `object1.hello()`). Optional.
    """
    if is_error(answer)[0]:
        if len(answer) > 1:
            type = answer[1]
            value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
            if answer[1] == REFERENCE_TYPE:
              raise Py4JJavaError(

"An error occurred while calling {0}{1}{2}.\n". format(target_id, ".", name), value) E py4j.protocol.Py4JJavaError: An error occurred while calling o99.run. E : java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression(boolean)' E at org.apache.spark.sql.DeequFunctions$.withAggregateFunction(DeequFunctions.scala:31) E at org.apache.spark.sql.DeequFunctions$.stateful_approx_count_distinct(DeequFunctions.scala:60) E at com.amazon.deequ.analyzers.ApproxCountDistinct.aggregationFunctions(ApproxCountDistinct.scala:52) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$3(AnalysisRunner.scala:319) E at scala.collection.immutable.List.flatMap(List.scala:366) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:319) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167) E at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110) E at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:141) E at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72) E at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.profileAndSuggest(ConstraintSuggestionRunner.scala:203) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.run(ConstraintSuggestionRunner.scala:102) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunBuilder.run(ConstraintSuggestionRunBuilder.scala:226) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) E at java.base/java.lang.reflect.Method.invoke(Method.java:566) E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) E at py4j.Gateway.invoke(Gateway.java:282) E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) E at py4j.commands.CallCommand.execute(CallCommand.java:79) E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) E at py4j.ClientServerConnection.run(ClientServerConnection.java:106) E at java.base/java.lang.Thread.run(Thread.java:834)

venv/lib/python3.8/site-packages/py4j/protocol.py:326: Py4JJavaError FAILED [ 25%] test_suggestions.py:36 (TestSuggestions.test_CompleteIfCompleteRule) self =

def test_CompleteIfCompleteRule(self):
  result = self.ConstraintSuggestionRunner.onData(self.df).addConstraintRule(CompleteIfCompleteRule()).run()

test_suggestions.py:38:


venv/lib/python3.8/site-packages/pydeequ/suggestions.py:81: in run result = self._ConstraintSuggestionRunBuilder.run() venv/lib/python3.8/site-packages/py4j/java_gateway.py:1309: in call return_value = get_return_value( venv/lib/python3.8/site-packages/pyspark/sql/utils.py:111: in deco return f(*a, **kw)


answer = 'xro106' gateway_client = <py4j.clientserver.JavaClient object at 0x1212e3a60> target_id = 'o103', name = 'run'

def get_return_value(answer, gateway_client, target_id=None, name=None):
    """Converts an answer received from the Java gateway into a Python object.

    For example, string representation of integers are converted to Python
    integer, string representation of objects are converted to JavaObject
    instances, etc.

    :param answer: the string returned by the Java gateway
    :param gateway_client: the gateway client used to communicate with the Java
        Gateway. Only necessary if the answer is a reference (e.g., object,
        list, map)
    :param target_id: the name of the object from which the answer comes from
        (e.g., *object1* in `object1.hello()`). Optional.
    :param name: the name of the member from which the answer comes from
        (e.g., *hello* in `object1.hello()`). Optional.
    """
    if is_error(answer)[0]:
        if len(answer) > 1:
            type = answer[1]
            value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
            if answer[1] == REFERENCE_TYPE:
              raise Py4JJavaError(

"An error occurred while calling {0}{1}{2}.\n". format(target_id, ".", name), value) E py4j.protocol.Py4JJavaError: An error occurred while calling o103.run. E : java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression(boolean)' E at org.apache.spark.sql.DeequFunctions$.withAggregateFunction(DeequFunctions.scala:31) E at org.apache.spark.sql.DeequFunctions$.stateful_approx_count_distinct(DeequFunctions.scala:60) E at com.amazon.deequ.analyzers.ApproxCountDistinct.aggregationFunctions(ApproxCountDistinct.scala:52) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$3(AnalysisRunner.scala:319) E at scala.collection.immutable.List.flatMap(List.scala:366) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:319) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167) E at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110) E at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:141) E at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72) E at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.profileAndSuggest(ConstraintSuggestionRunner.scala:203) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.run(ConstraintSuggestionRunner.scala:102) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunBuilder.run(ConstraintSuggestionRunBuilder.scala:226) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) E at java.base/java.lang.reflect.Method.invoke(Method.java:566) E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) E at py4j.Gateway.invoke(Gateway.java:282) E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) E at py4j.commands.CallCommand.execute(CallCommand.java:79) E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) E at py4j.ClientServerConnection.run(ClientServerConnection.java:106) E at java.base/java.lang.Thread.run(Thread.java:834)

venv/lib/python3.8/site-packages/py4j/protocol.py:326: Py4JJavaError FAILED [ 37%] test_suggestions.py:40 (TestSuggestions.test_FractionalCategoricalRangeRule) self =

def test_FractionalCategoricalRangeRule(self):
    result = (
      self.ConstraintSuggestionRunner.onData(self.df).addConstraintRule(FractionalCategoricalRangeRule()).run()

)

test_suggestions.py:43:


venv/lib/python3.8/site-packages/pydeequ/suggestions.py:81: in run result = self._ConstraintSuggestionRunBuilder.run() venv/lib/python3.8/site-packages/py4j/java_gateway.py:1309: in call return_value = get_return_value( venv/lib/python3.8/site-packages/pyspark/sql/utils.py:111: in deco return f(*a, **kw)


answer = 'xro110' gateway_client = <py4j.clientserver.JavaClient object at 0x1212e3a60> target_id = 'o107', name = 'run'

def get_return_value(answer, gateway_client, target_id=None, name=None):
    """Converts an answer received from the Java gateway into a Python object.

    For example, string representation of integers are converted to Python
    integer, string representation of objects are converted to JavaObject
    instances, etc.

    :param answer: the string returned by the Java gateway
    :param gateway_client: the gateway client used to communicate with the Java
        Gateway. Only necessary if the answer is a reference (e.g., object,
        list, map)
    :param target_id: the name of the object from which the answer comes from
        (e.g., *object1* in `object1.hello()`). Optional.
    :param name: the name of the member from which the answer comes from
        (e.g., *hello* in `object1.hello()`). Optional.
    """
    if is_error(answer)[0]:
        if len(answer) > 1:
            type = answer[1]
            value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
            if answer[1] == REFERENCE_TYPE:
              raise Py4JJavaError(

"An error occurred while calling {0}{1}{2}.\n". format(target_id, ".", name), value) E py4j.protocol.Py4JJavaError: An error occurred while calling o107.run. E : java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression(boolean)' E at org.apache.spark.sql.DeequFunctions$.withAggregateFunction(DeequFunctions.scala:31) E at org.apache.spark.sql.DeequFunctions$.stateful_approx_count_distinct(DeequFunctions.scala:60) E at com.amazon.deequ.analyzers.ApproxCountDistinct.aggregationFunctions(ApproxCountDistinct.scala:52) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$3(AnalysisRunner.scala:319) E at scala.collection.immutable.List.flatMap(List.scala:366) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:319) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167) E at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110) E at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:141) E at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72) E at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.profileAndSuggest(ConstraintSuggestionRunner.scala:203) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.run(ConstraintSuggestionRunner.scala:102) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunBuilder.run(ConstraintSuggestionRunBuilder.scala:226) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) E at java.base/java.lang.reflect.Method.invoke(Method.java:566) E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) E at py4j.Gateway.invoke(Gateway.java:282) E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) E at py4j.commands.CallCommand.execute(CallCommand.java:79) E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) E at py4j.ClientServerConnection.run(ClientServerConnection.java:106) E at java.base/java.lang.Thread.run(Thread.java:834)

venv/lib/python3.8/site-packages/py4j/protocol.py:326: Py4JJavaError FAILED [ 50%] test_suggestions.py:46 (TestSuggestions.test_NonNegativeNumbersRule) self =

def test_NonNegativeNumbersRule(self):
  result = self.ConstraintSuggestionRunner.onData(self.df).addConstraintRule(NonNegativeNumbersRule()).run()

test_suggestions.py:48:


venv/lib/python3.8/site-packages/pydeequ/suggestions.py:81: in run result = self._ConstraintSuggestionRunBuilder.run() venv/lib/python3.8/site-packages/py4j/java_gateway.py:1309: in call return_value = get_return_value( venv/lib/python3.8/site-packages/pyspark/sql/utils.py:111: in deco return f(*a, **kw)


answer = 'xro114' gateway_client = <py4j.clientserver.JavaClient object at 0x1212e3a60> target_id = 'o111', name = 'run'

def get_return_value(answer, gateway_client, target_id=None, name=None):
    """Converts an answer received from the Java gateway into a Python object.

    For example, string representation of integers are converted to Python
    integer, string representation of objects are converted to JavaObject
    instances, etc.

    :param answer: the string returned by the Java gateway
    :param gateway_client: the gateway client used to communicate with the Java
        Gateway. Only necessary if the answer is a reference (e.g., object,
        list, map)
    :param target_id: the name of the object from which the answer comes from
        (e.g., *object1* in `object1.hello()`). Optional.
    :param name: the name of the member from which the answer comes from
        (e.g., *hello* in `object1.hello()`). Optional.
    """
    if is_error(answer)[0]:
        if len(answer) > 1:
            type = answer[1]
            value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
            if answer[1] == REFERENCE_TYPE:
              raise Py4JJavaError(

"An error occurred while calling {0}{1}{2}.\n". format(target_id, ".", name), value) E py4j.protocol.Py4JJavaError: An error occurred while calling o111.run. E : java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression(boolean)' E at org.apache.spark.sql.DeequFunctions$.withAggregateFunction(DeequFunctions.scala:31) E at org.apache.spark.sql.DeequFunctions$.stateful_approx_count_distinct(DeequFunctions.scala:60) E at com.amazon.deequ.analyzers.ApproxCountDistinct.aggregationFunctions(ApproxCountDistinct.scala:52) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$3(AnalysisRunner.scala:319) E at scala.collection.immutable.List.flatMap(List.scala:366) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:319) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167) E at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110) E at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:141) E at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72) E at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.profileAndSuggest(ConstraintSuggestionRunner.scala:203) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.run(ConstraintSuggestionRunner.scala:102) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunBuilder.run(ConstraintSuggestionRunBuilder.scala:226) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) E at java.base/java.lang.reflect.Method.invoke(Method.java:566) E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) E at py4j.Gateway.invoke(Gateway.java:282) E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) E at py4j.commands.CallCommand.execute(CallCommand.java:79) E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) E at py4j.ClientServerConnection.run(ClientServerConnection.java:106) E at java.base/java.lang.Thread.run(Thread.java:834)

venv/lib/python3.8/site-packages/py4j/protocol.py:326: Py4JJavaError FAILED [ 62%] test_suggestions.py:50 (TestSuggestions.test_RetainCompletenessRule) self =

def test_RetainCompletenessRule(self):
  result = self.ConstraintSuggestionRunner.onData(self.df).addConstraintRule(RetainCompletenessRule()).run()

test_suggestions.py:52:


venv/lib/python3.8/site-packages/pydeequ/suggestions.py:81: in run result = self._ConstraintSuggestionRunBuilder.run() venv/lib/python3.8/site-packages/py4j/java_gateway.py:1309: in call return_value = get_return_value( venv/lib/python3.8/site-packages/pyspark/sql/utils.py:111: in deco return f(*a, **kw)


answer = 'xro118' gateway_client = <py4j.clientserver.JavaClient object at 0x1212e3a60> target_id = 'o115', name = 'run'

def get_return_value(answer, gateway_client, target_id=None, name=None):
    """Converts an answer received from the Java gateway into a Python object.

    For example, string representation of integers are converted to Python
    integer, string representation of objects are converted to JavaObject
    instances, etc.

    :param answer: the string returned by the Java gateway
    :param gateway_client: the gateway client used to communicate with the Java
        Gateway. Only necessary if the answer is a reference (e.g., object,
        list, map)
    :param target_id: the name of the object from which the answer comes from
        (e.g., *object1* in `object1.hello()`). Optional.
    :param name: the name of the member from which the answer comes from
        (e.g., *hello* in `object1.hello()`). Optional.
    """
    if is_error(answer)[0]:
        if len(answer) > 1:
            type = answer[1]
            value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
            if answer[1] == REFERENCE_TYPE:
              raise Py4JJavaError(

"An error occurred while calling {0}{1}{2}.\n". format(target_id, ".", name), value) E py4j.protocol.Py4JJavaError: An error occurred while calling o115.run. E : java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression(boolean)' E at org.apache.spark.sql.DeequFunctions$.withAggregateFunction(DeequFunctions.scala:31) E at org.apache.spark.sql.DeequFunctions$.stateful_approx_count_distinct(DeequFunctions.scala:60) E at com.amazon.deequ.analyzers.ApproxCountDistinct.aggregationFunctions(ApproxCountDistinct.scala:52) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$3(AnalysisRunner.scala:319) E at scala.collection.immutable.List.flatMap(List.scala:366) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:319) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167) E at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110) E at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:141) E at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72) E at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.profileAndSuggest(ConstraintSuggestionRunner.scala:203) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.run(ConstraintSuggestionRunner.scala:102) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunBuilder.run(ConstraintSuggestionRunBuilder.scala:226) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) E at java.base/java.lang.reflect.Method.invoke(Method.java:566) E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) E at py4j.Gateway.invoke(Gateway.java:282) E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) E at py4j.commands.CallCommand.execute(CallCommand.java:79) E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) E at py4j.ClientServerConnection.run(ClientServerConnection.java:106) E at java.base/java.lang.Thread.run(Thread.java:834)

venv/lib/python3.8/site-packages/py4j/protocol.py:326: Py4JJavaError FAILED [ 75%] test_suggestions.py:54 (TestSuggestions.test_RetainTypeRule) self =

def test_RetainTypeRule(self):
  result = self.ConstraintSuggestionRunner.onData(self.df).addConstraintRule(RetainTypeRule()).run()

test_suggestions.py:56:


venv/lib/python3.8/site-packages/pydeequ/suggestions.py:81: in run result = self._ConstraintSuggestionRunBuilder.run() venv/lib/python3.8/site-packages/py4j/java_gateway.py:1309: in call return_value = get_return_value( venv/lib/python3.8/site-packages/pyspark/sql/utils.py:111: in deco return f(*a, **kw)


answer = 'xro122' gateway_client = <py4j.clientserver.JavaClient object at 0x1212e3a60> target_id = 'o119', name = 'run'

def get_return_value(answer, gateway_client, target_id=None, name=None):
    """Converts an answer received from the Java gateway into a Python object.

    For example, string representation of integers are converted to Python
    integer, string representation of objects are converted to JavaObject
    instances, etc.

    :param answer: the string returned by the Java gateway
    :param gateway_client: the gateway client used to communicate with the Java
        Gateway. Only necessary if the answer is a reference (e.g., object,
        list, map)
    :param target_id: the name of the object from which the answer comes from
        (e.g., *object1* in `object1.hello()`). Optional.
    :param name: the name of the member from which the answer comes from
        (e.g., *hello* in `object1.hello()`). Optional.
    """
    if is_error(answer)[0]:
        if len(answer) > 1:
            type = answer[1]
            value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
            if answer[1] == REFERENCE_TYPE:
              raise Py4JJavaError(

"An error occurred while calling {0}{1}{2}.\n". format(target_id, ".", name), value) E py4j.protocol.Py4JJavaError: An error occurred while calling o119.run. E : java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression(boolean)' E at org.apache.spark.sql.DeequFunctions$.withAggregateFunction(DeequFunctions.scala:31) E at org.apache.spark.sql.DeequFunctions$.stateful_approx_count_distinct(DeequFunctions.scala:60) E at com.amazon.deequ.analyzers.ApproxCountDistinct.aggregationFunctions(ApproxCountDistinct.scala:52) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$3(AnalysisRunner.scala:319) E at scala.collection.immutable.List.flatMap(List.scala:366) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:319) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167) E at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110) E at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:141) E at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72) E at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.profileAndSuggest(ConstraintSuggestionRunner.scala:203) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.run(ConstraintSuggestionRunner.scala:102) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunBuilder.run(ConstraintSuggestionRunBuilder.scala:226) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) E at java.base/java.lang.reflect.Method.invoke(Method.java:566) E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) E at py4j.Gateway.invoke(Gateway.java:282) E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) E at py4j.commands.CallCommand.execute(CallCommand.java:79) E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) E at py4j.ClientServerConnection.run(ClientServerConnection.java:106) E at java.base/java.lang.Thread.run(Thread.java:834)

venv/lib/python3.8/site-packages/py4j/protocol.py:326: Py4JJavaError FAILED [ 87%] test_suggestions.py:58 (TestSuggestions.test_UniqueIfApproximatelyUniqueRule) self =

def test_UniqueIfApproximatelyUniqueRule(self):
    result = (
      self.ConstraintSuggestionRunner.onData(self.df).addConstraintRule(UniqueIfApproximatelyUniqueRule()).run()

)

test_suggestions.py:61:


venv/lib/python3.8/site-packages/pydeequ/suggestions.py:81: in run result = self._ConstraintSuggestionRunBuilder.run() venv/lib/python3.8/site-packages/py4j/java_gateway.py:1309: in call return_value = get_return_value( venv/lib/python3.8/site-packages/pyspark/sql/utils.py:111: in deco return f(*a, **kw)


answer = 'xro126' gateway_client = <py4j.clientserver.JavaClient object at 0x1212e3a60> target_id = 'o123', name = 'run'

def get_return_value(answer, gateway_client, target_id=None, name=None):
    """Converts an answer received from the Java gateway into a Python object.

    For example, string representation of integers are converted to Python
    integer, string representation of objects are converted to JavaObject
    instances, etc.

    :param answer: the string returned by the Java gateway
    :param gateway_client: the gateway client used to communicate with the Java
        Gateway. Only necessary if the answer is a reference (e.g., object,
        list, map)
    :param target_id: the name of the object from which the answer comes from
        (e.g., *object1* in `object1.hello()`). Optional.
    :param name: the name of the member from which the answer comes from
        (e.g., *hello* in `object1.hello()`). Optional.
    """
    if is_error(answer)[0]:
        if len(answer) > 1:
            type = answer[1]
            value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
            if answer[1] == REFERENCE_TYPE:
              raise Py4JJavaError(

"An error occurred while calling {0}{1}{2}.\n". format(target_id, ".", name), value) E py4j.protocol.Py4JJavaError: An error occurred while calling o123.run. E : java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression(boolean)' E at org.apache.spark.sql.DeequFunctions$.withAggregateFunction(DeequFunctions.scala:31) E at org.apache.spark.sql.DeequFunctions$.stateful_approx_count_distinct(DeequFunctions.scala:60) E at com.amazon.deequ.analyzers.ApproxCountDistinct.aggregationFunctions(ApproxCountDistinct.scala:52) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$3(AnalysisRunner.scala:319) E at scala.collection.immutable.List.flatMap(List.scala:366) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:319) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167) E at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110) E at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:141) E at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72) E at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.profileAndSuggest(ConstraintSuggestionRunner.scala:203) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.run(ConstraintSuggestionRunner.scala:102) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunBuilder.run(ConstraintSuggestionRunBuilder.scala:226) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) E at java.base/java.lang.reflect.Method.invoke(Method.java:566) E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) E at py4j.Gateway.invoke(Gateway.java:282) E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) E at py4j.commands.CallCommand.execute(CallCommand.java:79) E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) E at py4j.ClientServerConnection.run(ClientServerConnection.java:106) E at java.base/java.lang.Thread.run(Thread.java:834)

venv/lib/python3.8/site-packages/py4j/protocol.py:326: Py4JJavaError FAILED [100%] test_suggestions.py:64 (TestSuggestions.test_default) self =

def test_default(self):
  result = self.ConstraintSuggestionRunner.onData(self.df).addConstraintRule(DEFAULT()).run()

test_suggestions.py:66:


venv/lib/python3.8/site-packages/pydeequ/suggestions.py:81: in run result = self._ConstraintSuggestionRunBuilder.run() venv/lib/python3.8/site-packages/py4j/java_gateway.py:1309: in call return_value = get_return_value( venv/lib/python3.8/site-packages/pyspark/sql/utils.py:111: in deco return f(*a, **kw)


answer = 'xro142' gateway_client = <py4j.clientserver.JavaClient object at 0x1212e3a60> target_id = 'o127', name = 'run'

def get_return_value(answer, gateway_client, target_id=None, name=None):
    """Converts an answer received from the Java gateway into a Python object.

    For example, string representation of integers are converted to Python
    integer, string representation of objects are converted to JavaObject
    instances, etc.

    :param answer: the string returned by the Java gateway
    :param gateway_client: the gateway client used to communicate with the Java
        Gateway. Only necessary if the answer is a reference (e.g., object,
        list, map)
    :param target_id: the name of the object from which the answer comes from
        (e.g., *object1* in `object1.hello()`). Optional.
    :param name: the name of the member from which the answer comes from
        (e.g., *hello* in `object1.hello()`). Optional.
    """
    if is_error(answer)[0]:
        if len(answer) > 1:
            type = answer[1]
            value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
            if answer[1] == REFERENCE_TYPE:
              raise Py4JJavaError(

"An error occurred while calling {0}{1}{2}.\n". format(target_id, ".", name), value) E py4j.protocol.Py4JJavaError: An error occurred while calling o127.run. E : java.lang.NoSuchMethodError: 'org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction.toAggregateExpression(boolean)' E at org.apache.spark.sql.DeequFunctions$.withAggregateFunction(DeequFunctions.scala:31) E at org.apache.spark.sql.DeequFunctions$.stateful_approx_count_distinct(DeequFunctions.scala:60) E at com.amazon.deequ.analyzers.ApproxCountDistinct.aggregationFunctions(ApproxCountDistinct.scala:52) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$3(AnalysisRunner.scala:319) E at scala.collection.immutable.List.flatMap(List.scala:366) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:319) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318) E at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167) E at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110) E at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:141) E at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72) E at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.profileAndSuggest(ConstraintSuggestionRunner.scala:203) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.run(ConstraintSuggestionRunner.scala:102) E at com.amazon.deequ.suggestions.ConstraintSuggestionRunBuilder.run(ConstraintSuggestionRunBuilder.scala:226) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) E at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) E at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) E at java.base/java.lang.reflect.Method.invoke(Method.java:566) E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) E at py4j.Gateway.invoke(Gateway.java:282) E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) E at py4j.commands.CallCommand.execute(CallCommand.java:79) E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) E at py4j.ClientServerConnection.run(ClientServerConnection.java:106) E at java.base/java.lang.Thread.run(Thread.java:834)

venv/lib/python3.8/site-packages/py4j/protocol.py:326: Py4JJavaError

test_suggestions.py::TestSuggestions::test_CompleteIfCompleteRule test_suggestions.py::TestSuggestions::test_FractionalCategoricalRangeRule test_suggestions.py::TestSuggestions::test_NonNegativeNumbersRule test_suggestions.py::TestSuggestions::test_RetainCompletenessRule test_suggestions.py::TestSuggestions::test_RetainTypeRule test_suggestions.py::TestSuggestions::test_UniqueIfApproximatelyUniqueRule test_suggestions.py::TestSuggestions::test_default

============================== 8 failed in 7.66s ===============================

Process finished with exit code 1

mycaule commented 2 years ago

Same here, using pyspark==3.1.2

java.lang.NoSuchMethodError: 
'org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression

and

session = SparkSession.builder \
    .config("spark.jars.packages", pydeequ.deequ_maven_coord) \
    .config("spark.jars.excludes", pydeequ.f2j_maven_coord) \
    .master("local") \
    .appName("my_app") \
    .getOrCreate()