I'd like to use a ColumnProfiler to keep track of a previous result together with the current data.
For example we have
case class Student(name: String, surname: String, middleName: Option[String])
and 2 different run (daily basis):
first run yesterday with a Student("ciccio", "pasticcio", None ) which give us a Completeness("middleName") = 0
second run today with a Student("ciccio", "pasticcio", "the best") which has a localCompleteness("middleName") = 0 but together with Run 1 I'd like to have a Completeness of 0.5 - (1+0)/2
Code I'm using
val result = ColumnProfilerRunner() .onData(validDf) .restrictToColumns(Seq("middleName")) .useRepository(repository) .reuseExistingResultsForKey(ResultKey(1636556003353L)) .saveOrAppendResult(currentRunResultKey) .run()
where the ResultKey is the key of the first Run.
If I use the previous run key, it gives me a Completeness = 0 (instead of 0.5)
If not using the existing result key, it gives me a Completeness = 1 (the correct one for only run 2 data)
I'd like to use a ColumnProfiler to keep track of a previous result together with the current data.
For example we have
case class Student(name: String, surname: String, middleName: Option[String])
and 2 different run (daily basis):
first run yesterday with a
Student("ciccio", "pasticcio", None )
which give us aCompleteness("middleName")
= 0second run today with a
Student("ciccio", "pasticcio", "the best")
which has a localCompleteness("middleName")
= 0 but together with Run 1 I'd like to have a Completeness of0.5
- (1+0)/2Code I'm using
val result = ColumnProfilerRunner() .onData(validDf) .restrictToColumns(Seq("middleName")) .useRepository(repository) .reuseExistingResultsForKey(ResultKey(1636556003353L)) .saveOrAppendResult(currentRunResultKey) .run()
where the ResultKey is the key of the first Run.
How to achieve the result? is it possible?
Thanks