aaronmcdaid / Overlapping-NMI

51 stars 24 forks source link

A bug in calculating LFKNMI #3

Closed amirubin87 closed 6 years ago

amirubin87 commented 6 years ago

Hi Aaron! While testing your output against mine, I got differnt LFK-NMI scores.

I'm referring to the function LFKNMI in line 290. Looking in your code, I think I found the root cause (after I changed it I got the same output as mine..): It seems that using the VI_oneSide<true> with g1 and g2 - which, according to your paper, should give H(g1|g2)/H(g1) is not giving the same as the straight forward calculation of it: `

double H_g2 = 0.0;
for(int toId = 0; toId < (int)g2.size(); toId++) {
    const int x = g2.at(toId).size();
    H_g2 += H(x, om.N)+H(om.N-x, om.N);
}
double H_g1 = 0.0;
for(int fromId = 0; fromId < (int)g1.size(); fromId++) {
    const int x = g1.at(fromId).size();
    H_g1 += H(x, om.N)+H(om.N-x, om.N);
}
// Division added by Amir Rubin
// Param changed to false by Amir Rubin
return 1.0 - 0.5 *
    ( (VI_oneSide<false>(omFlipped, g1, g2)/H_g1)
    + (VI_oneSide<false>(om       , g2, g1)/H_g2 ));`

As a test case - look at the two collections: {{1,2},{3,5},{4},{6}} and {{1,3},{4,5},{2},{6}}. Your code gives: lfkNMI: 0.470025, and after the proposed fix you get: lfkNMI: 0.423228.

aaronmcdaid commented 6 years ago

(I've closed this issue, but I'm very happy to reopen if there is a problem!)

Hi, thanks for your input and sorry for the late reply. I don't think any fix is needed as my code currently prints the numbers you expect. With those two files, I get this output:

NMI<Max>:       0.423228
Other measures:
  lfkNMI:       0.470025
  NMI<Sum>:     0.423228

You say that 0.428228 is the correct answer, and it appears here as the result for both NMI<Max> and NMI<Sum>. The number for lfkNMI (0.470025) is intended as a replication of the method of Lancichinetti, Fortunato and Kertész, which I describe in equation (7) of my pdf (http://arxiv.org/abs/1110.2515). The division in equation (7) is implemented on line 262 of onmi.cpp for one of the two terms:

const double H_X = H(x,N) + H(N-x,N);
const double norm = unnorm / H_X;

The code you suggest is already implemented on line 323.

Perhaps I have misunderstood your comment? I wrote this over six years ago and maybe I don't understand it any more!