Incremental ColumnProfiler

I'd like to use a ColumnProfiler to keep track of a previous result together with the current data.

For example we have

case class Student(name: String, surname: String, middleName: Option[String])

and 2 different run (daily basis):

first run yesterday with a Student("ciccio", "pasticcio", None ) which give us a Completeness("middleName") = 0
second run today with a Student("ciccio", "pasticcio", "the best") which has a local Completeness("middleName") = 0 but together with Run 1 I'd like to have a Completeness of 0.5 - (1+0)/2

Code I'm using

val result = ColumnProfilerRunner() .onData(validDf) .restrictToColumns(Seq("middleName")) .useRepository(repository) .reuseExistingResultsForKey(ResultKey(1636556003353L)) .saveOrAppendResult(currentRunResultKey) .run()

where the ResultKey is the key of the first Run.

If I use the previous run key, it gives me a Completeness = 0 (instead of 0.5)
If not using the existing result key, it gives me a Completeness = 1 (the correct one for only run 2 data)

How to achieve the result? is it possible?

Thanks

awslabs / deequ

Incremental ColumnProfiler #397