I'm struggling to understand why I ended up with the infinite k-distance, and if it is a problem (or a peculiarity) with my data, but I keep running into this issue.
Changing to use an EPSILON if the local reachability density is zero, seems to give sensible results. I'll raise a PR in case that's an approach you'd like to take.
The
LocalOutlierFactor
anomaly detector sometimes sets the k-distance toINF
at https://github.com/RubixML/ML/blob/master/src/AnomalyDetectors/LocalOutlierFactor.php#L215.Then, when using contamination and running
localReachabilityDensity
we take a max of the k-distances https://github.com/RubixML/ML/blob/master/src/AnomalyDetectors/LocalOutlierFactor.php#L317 and divide 1 by it, giving zero.Lastly, this zero is used for division at https://github.com/RubixML/ML/blob/master/src/AnomalyDetectors/LocalOutlierFactor.php#L295 which gives a division by zero error.
I'm struggling to understand why I ended up with the infinite k-distance, and if it is a problem (or a peculiarity) with my data, but I keep running into this issue.