Open MalfoyJW opened 4 years ago
Were you able to figure out a solution for your problem? Unfortunately I'm stumbling with the same issue
A workaround is to replace empty strings with a special string and then apply the hasPattern over the updated column, i.e. inputDF.withColumn("NEW_COL", regexp_replace(col("COL_NAME"), "^$", "N/A"))
I have a data set which is consist of empty string like "" 30 rows and some format like [\d]+ 70 rows The value of isComplete("columnName") is 1.0, So I think this column has no null.
But when I try below code on scala project, It can not count empty string like"". .hasPattern("columnName", """[\d]+|^$""".r) => The value of this is 0.7 not 1.0 But If I try on jupyter notebook, it can capture all of rows. df.filter($"columnName" rlike "[\d]+|^$").count() => The value is 100
Is there any special expression for empty string at deequ?