Open thvasilo opened 2 years ago
While these params are still defined as required... When it comes to binningUdf you could simply set it as None.
check = Check(spark, CheckLevel.Warning, "test hasHistogramValues")
result = (VerificationSuite(spark).onData(df)
.addCheck(check
.hasHistogramValues("c_1",
lambda x: x.apply("66").absolute() > 4500000, None, 2)
.hasHistogramValues("c_2",
lambda x: x.apply("22").ratio() > 0.5, None, 2))
).run()
Thanks @brunoRenzo6, adding this to the method's docs could help.
Describe the bug The docs and Scala code for
hasNumberOfDistinctValues
andhasHistogramValues
indicate that providing thebinningUdf, maxBins
parameters should be optional, but from the function definitions they seem to be required.To Reproduce Steps to reproduce the behavior:
check.hasNumberOfDistinctValues('column_name', lambda x: x == 6)
Expected behavior I'd like to be able to call the hasNumberOfDistinctValues and hasHistogramValues without specifying a binning function and maxBins.