alan-turing-institute / datadiff

Datadiff is diff for data
MIT License
26 stars 2 forks source link

Extend column-accuracy metric to give partial credit. #13

Closed thobson88 closed 7 years ago

thobson88 commented 7 years ago

Currently, column-accuracy is a Boolean condition. A result patch is "column-accurate" w.r.t. a corruption (and a particular patch type) iff the set of columns transformed by the result is equal to the set of columns transformed by the corruption.

If necessary (and this will depend on the performance of the datadiff algorithm), replace the Boolean column-accuracy (as returned by the is_column_accurate function) with a rational number computed as the fraction of columns which are transformed by the corruption that are also transformed by the result. (This requires only a minor revision of is_column_accurate.)