gdkrmr / WeightedOnlineStats.jl

Weighted version of OnlineStats.jl
MIT License
10 stars 4 forks source link

Division by zero if o.W == 0 results in `NaN` values #42

Open rlefmann opened 3 years ago

rlefmann commented 3 years ago

Some of the algorithms, like WeightedMean and WeightedCovMatrix, return NaN values if the first weight is zero. Example:

using WeightedOnlineStats

m = WeightedMean()
x = ones(3)
w = [0.0, 0.5, 1.0]
fit!(m, x, w)
# WeightedMean: ∑wᵢ=1.5 | value=NaN

# but if we reverse the sequences everything works as expected:
m = WeightedMean()
fit!(m, reverse(x), reverse(w))
# WeightedMean: ∑wᵢ=1.5 | value=1.0

If at any point in the computation the sum of the previous weights, o.W, becomes zero, there will be NaN values because of division by zero. For the weighted mean this could be fixed by not changing mu in that case.

gdkrmr commented 3 years ago

Good catch. Thanks for reporting. The whole thing was written without zero weights in mind.

This also poses the question if we should we increase the counter of observations when encountering a zero weight or just not do anything at all?

We also don't guard for negative weights so should we guard for zero weights? I personally lean in favor of leaving this in the responsibility of the user.

To spin this even further: Negative weights could also lead to W == 0 later on.