FRosner / drunken-data-quality

Spark package for checking data quality
Apache License 2.0
222 stars 69 forks source link

Programmatic access to failed rows #95

Open FRosner opened 8 years ago

FRosner commented 8 years ago

Problem

A user wants programmatic access to the failing rows of a constraint to persist them e.g. in an error table.

However it is not trivial to design the API for two main reasons:

  1. It is not general enough to be applied to all checks as it only works for row wise checks
  2. Due to the sad fact of not having binary logic (we have to deal with null) it is not trivial to answer how to compute the other side if you have failing or satisfying rows.