elki-project / elki

ELKI Data Mining Toolkit
https://elki-project.github.io/
GNU Affero General Public License v3.0
780 stars 321 forks source link

INFLO does not compute RNN correctly #43

Closed mridulm closed 6 years ago

mridulm commented 6 years ago

RNN computed is only based on the neighbor's of the point; and does not include reverse neighbors which are not in the current point's k neighbors.

This is based on de.lmu.ifi.dbs.elki.algorithm.outlier.lof. INFLO#computeNeighborhoods

As currently written, RNN will always be a subset of knn.

kno10 commented 6 years ago

Did you look at algorithm 2 in the INFLO paper? It seems to closely match our code. While I do agree that this does not find the "real" RNN, it seems to be literally what the INFLO paper proposes in algorithm 2. It is a fairly common thing that a paper has some inconsistency like this, and I'd rather stick close to the paper's pseudocode, in particular since the paper doesn't appear to include a proper definition of IS_k.

There is a different bug in our INFLO code though, f288bbac79e09b40fddfa6fdeb94cd72104782b8, which caused it to consider points twice that were in the kNN and RkNN; probably due to some refactoring.