RubixML / ML

A high-level machine learning and deep learning library for the PHP language.
https://rubixml.com
MIT License
2k stars 177 forks source link

Division by Zero in LocalOutlierFactor #334

Closed dmnc closed 1 month ago

dmnc commented 1 month ago

The LocalOutlierFactor anomaly detector sometimes sets the k-distance to INF at https://github.com/RubixML/ML/blob/master/src/AnomalyDetectors/LocalOutlierFactor.php#L215.

Then, when using contamination and running localReachabilityDensity we take a max of the k-distances https://github.com/RubixML/ML/blob/master/src/AnomalyDetectors/LocalOutlierFactor.php#L317 and divide 1 by it, giving zero.

Lastly, this zero is used for division at https://github.com/RubixML/ML/blob/master/src/AnomalyDetectors/LocalOutlierFactor.php#L295 which gives a division by zero error.

I'm struggling to understand why I ended up with the infinite k-distance, and if it is a problem (or a peculiarity) with my data, but I keep running into this issue.

dmnc commented 1 month ago

Changing to use an EPSILON if the local reachability density is zero, seems to give sensible results. I'll raise a PR in case that's an approach you'd like to take.

andrewdalpino commented 1 month ago

Thank you!