FRosner / drunken-data-quality

Spark package for checking data quality
Apache License 2.0
222 stars 69 forks source link

Add method to check if column values are in a static list of predefined values #27

Closed ghost closed 9 years ago

ghost commented 9 years ago
isAnyOf(columnName: String, listOfValues: Seq)

make sure the list of values can contain string values with single quotes , e.g. "Doctor's Degree"

FRosner commented 9 years ago

@mfsny I think we should change the signature from Seq to Iterable?

What do you mean with

make sure the list of values can contain string values with single quotes , e.g. "Doctor's Degree"

Can't we just use equality check for all given values. It can be an Iterable of Any so it does not only work with strings but all possible types.

ghost commented 9 years ago

Iterable is more generic and fine with me.

What I meant is that we had an issue with single quotes in our prototype (because we used filter("columnName IN ('value1', 'value2', .., 'valueN') to check this constraint) and the test should include also strings with single quotes.