Hoosier-Clusters / clusim

An extended package for clustering similarity
MIT License
63 stars 15 forks source link

Adding RMI #29

Closed jg-you closed 5 years ago

jg-you commented 5 years ago

As mentioned in issue #27, this PR adds the reduced mutual information (introduced in this preprint).

Some remarks: I defined get_log_omega as a private function. It calculates the log of the number of contigency table with fixed row and column margins, and it didn't sound like something that needed to be exposed. I can move it if you think that it is better.

I also did not use the function nmi to calculate the mutual information, because we use an "exact version" of the MI, calculated combinatorically, that only equals the more standard MI in the limit of large clusters. (see Eq.~(24) of our preprint).

yy commented 5 years ago

Thanks! I haven't closely examined nor tested yet (hope that @ajgates42 can take a closer look ), but overall looks good! I think having get_log_omega as a private function is fine for now. I also don't see much use in other cases at the moment. And thanks for updating the bib for our papers!

ajgates42 commented 5 years ago

Thanks @jg-you, this looks great!