canimus / cuallee

Possibly the fastest DataFrame-agnostic quality check library in town.
https://canimus.github.io/cuallee/
Apache License 2.0
173 stars 20 forks source link

Getting Error for sample code provided in Readme file #306

Closed Ranji-1712 closed 2 months ago

Ranji-1712 commented 2 months ago

Code :

check = Check(CheckLevel.WARNING, "CheckIsBetweenDates")
df = spark.sql(
    """
    SELECT
        explode(
            sequence(
                to_date('2022-01-01'),
                to_date('2022-01-10'),
                interval 1 day)) as date
    """)
assert (
    check.is_between("date", "2022-01-01", "2022-01-10")
    .validate(df)
    .first()
    .status == "PASS"
)

Error:

    124 def __post_init__(self):
--> 125     if (self.coverage <= 0) or (self.coverage > 1):
    126         raise ValueError("Coverage should be between 0 and 1")
    128     if isinstance(self.column, List):

TypeError: '<=' not supported between instances of 'str' and 'int'
canimus commented 2 months ago

Hi @Ranji-1712 thanks for raising the issue, the problem is in the parameters of the is_between function. It should receive a collection, as opposed to unbounded arguments.

If you use:

check.is_between("date", ("2022-01-01", "2022-01-10"))

then it should work, we will update the the README.md to fix this. Thank you!

Ranji-1712 commented 2 months ago

@canimus Still facing the below issue .

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[11], line 3
      1 (
      2     check.is_between("date", ("2022-01-01", "2022-01-10"))
----> 3     .validate(df)
      4     .first()
      5     .status == "PASS"
      6 )

File ~/python3.11/site-packages/cuallee/__init__.py:1249, in Check.validate(self, dataframe)
   1246     self.compute_engine = importlib.import_module("cuallee.daft_validation")
   1248 else:
-> 1249     raise Exception(
   1250         "Cuallee is not ready for this data structure. You can log a Feature Request in Github."
   1251     )
   1253 assert self.compute_engine.validate_data_types(
   1254     self.rules, dataframe
   1255 ), "Invalid data types between rules and dataframe"
   1257 return self.compute_engine.summary(self, dataframe)

Exception: Cuallee is not ready for this data structure. You can log a Feature Request in Github.
canimus commented 2 months ago

Hi @Ranji-1712 this is something different right? not the code in the README.md Without looking into what your code sample and just reading the exception, it seems that unfortunately cuallee does not support the data structure you are trying to test against. If you could show or elaborate on the. problem perhaps it will be more clear as of where the problem is. My recommendation is also to look at the test folder in the repo, as there are plenty of examples on how to implement specific checks for each of the dataframes supported.