awslabs / python-deequ

Python API for Deequ
Apache License 2.0
691 stars 132 forks source link

Generated code for fractional rule fails with "isContainedIn() takes 3 positional arguments but 5 were given" #25

Closed nfx closed 3 years ago

nfx commented 3 years ago

Describe the bug

{'constraint_name': "ComplianceConstraint(Compliance('column' has value range 'A', 'B' for at least 90.0% of values,`state` IN ('A', 'B'),None))",
  'column_name': 'column',
  'current_value': 'Compliance: 0.929539295392954',
  'description': "'column' has value range 'A', 'B' for at least 90.0% of values",
  'suggesting_rule': 'FractionalCategoricalRangeRule(0.9)',
  'rule_description': 'If we see a categorical range for most values in a column, we suggest an IS IN (...) constraint that should hold for most values',
  'code_for_constraint': '.isContainedIn("column", ["A", "B"], lambda x: x >= 0.9, "It should be above 0.9!")'},
RainVagel commented 3 years ago

I am getting the same error. Looking at the code, it seems like the analyser is suggesting the use of a function that is available in the Scala version, called Deequ, but is not supported in PyDeequ.

However I would also be very interested in using the isContainedIn function with an assertion.

zbodi74 commented 3 years ago

Here is a fix for this in my fork: https://github.com/zbodi74/python-deequ/commit/bb053e276f5372b5d1a32396c5016b9f0d5abc97 It is tentative for now, as I did not yet have a chance to familiarize myself with the code or thoroughly test the change. Let me know if you see any problems. Feel free to merge it though, also I can submit a PR a little bit later.

nfx commented 3 years ago

@zbodi74 could you send that pr to this repo? :)

gucciwang commented 3 years ago

Fixed with PyDeequ-0.1.7 !