Motivation
Final step of the pipeline relies on given a set of results expressed as a dictionary of {threshold: metrics_dict}, a target value to calculate the AU-GOOD with, a target dataset, and the set of similarity metrics used to calculate the partitions. It then calculates the similarity between the original data and the target dataset.
Possible implementation
Calculate similarities between query and target distributions, using already implemented functions.
Create a histogram with the same min and max value as the partitions, and with the same step for the number of bins.
Normalise the histogram (counts / counts.sum())
To get the AU-GOOD, perform dot product between normalise counts and values. This is equivalent to sum(a*b), which is the finite form of the integral AU-GOOD integral.
Alternatives
4b. Calculate a*b and sum(a*b), separately so that a user can have access to the GOOD curve to represent if they so desire.
Motivation Final step of the pipeline relies on given a set of results expressed as a dictionary of {threshold: metrics_dict}, a target value to calculate the AU-GOOD with, a target dataset, and the set of similarity metrics used to calculate the partitions. It then calculates the similarity between the original data and the target dataset.
Possible implementation
counts / counts.sum()
)sum(a*b)
, which is the finite form of the integral AU-GOOD integral.Alternatives 4b. Calculate
a*b
andsum(a*b)
, separately so that a user can have access to the GOOD curve to represent if they so desire.