Tchanders / InformationMeasures.jl

Entropy, mutual information and higher order measures from information theory, with various estimators and discretisation methods.
Other
66 stars 14 forks source link

Add cross-entropy (and Kullback-Leibler divergence?) #12

Closed istvankleijn closed 7 years ago

istvankleijn commented 7 years ago

I would like to use this package to sample the Kullback-Leibler (KL) divergence between two data sets.

The KL divergence can be calculated as the difference of the cross-entropy between the two sets and the entropy of the first one.

In this PR I have added my implementation of calculating the cross-entropy from two sampled data sets. Eventually, I would like to calculate the KL divergence as well, however a naive implementation such as

cross_entropy(values_x, values_y) - get_entropy(values_x)

does not handle zeroes correctly, leading to possible negative values of the KL divergence. This is because cross_entropy excludes nonfinite frequencies, whereas this information is not relayed to get_entropy.

istvankleijn commented 7 years ago

Updated using your suggestions, thanks. I'll work on the KL divergence today---I'll make the commit consistent this time, sorry :)

Tchanders commented 7 years ago

Not at all, this is great - thanks again!