Closed prashantprp closed 3 years ago
The output means 88% of the rows contains data of type fractional. As the constraint looks for 100% match, it is failing. 0.881 -> Is the ratio of rows within the constraint to that of the total no. of rows.
Use .hasDataType(column, ConstrainableTypes.Numeric)
if you want to allow both Integral
and Fractional
column types.
FWIW, my fork of deequ will suggest constraint with Numeric
if your column contains both types:
https://github.com/aviatesk/deequ/pull/2
Closing due to inactivity
I am using the supermarket_sales.csv import pydeequ from pydeequ.analyzers import * df = spark.read.option("header","true").csv("/FileStore/tables/supermarket_sales.csv")
and the .hasDataType("Rating",ConstrainableDataTypes.Fractional) returns false citing - |Value: 0.881 does not meet the constraint requirement, but it is a small 1000 row data set and there is no 0.881 column on the rating column, where does deequ randomly pull this information from. supermarket_sales.zip