Closed nygge closed 13 years ago
folsom_sample_uniform does not generate a true random sampling of the values. Once the reservoir is full each new value is replacing an existing value in the reservoir. This causes a bias for later values. e.g.
lists:foldl(fun(V,A) -> folsom_sample_uniform:update(A,V) end, folsom_sample_uniform:new(5000),lists:seq(1,50000)). lists:sum(folsom_sample_uniform:get_values(S))/5000.
gives an arithmetic mean of ~45000 instead of the expected 25000
Exactly which algorithm in the Vitter paper is supposed to be implemented here?
I was attempting to implement "Algorithm R". Used https://github.com/codahale/metrics/blob/development/metrics-core/src/main/java/com/yammer/metrics/stats/UniformSample.java as a reference.
folsom_sample_uniform does not generate a true random sampling of the values. Once the reservoir is full each new value is replacing an existing value in the reservoir. This causes a bias for later values. e.g.
gives an arithmetic mean of ~45000 instead of the expected 25000
Exactly which algorithm in the Vitter paper is supposed to be implemented here?