Open NAThompson opened 5 years ago
@CromwellEnage : I saw on the mailing list that you are now looking to maintain this library; could you fix this bug? The fix is in Higham's book Accuracy and Stability of Numerical Algorithms, first chapter.
@NAThompson, I'll look into it and let you know when I or someone else have implemented a solution.
@CromwellEnage : This also implements the correct algorithm.
@NAThompson , variance(lazy) by definition is lazy and computes variance as the difference between the two accumulators. This issue can be closed in my opinion as variance(immediate) exists already and behaves very closely to the algorithm that you point out. See:
#include <iostream>
#include <vector>
#include <random>
#include <cmath>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>
using namespace boost::accumulators;
template<class Real>
void golub_example()
{
std::mt19937 gen(1925235287);
int N = 50000;
std::normal_distribution<Real> dis(500.0, 0.01);
std::vector<Real> v(N);
for(size_t i = 0; i < v.size(); ++i)
{
v[i] = dis(gen);
}
accumulator_set<Real, stats<tag::variance(immediate)>> acc;
acc = std::for_each(v.begin(), v.end(), acc);
std::cout << "Variance = " << variance(acc) << "\n";
}
int main()
{
golub_example<float>();
}
Output:
Variance = 0.000100213
Now, if N=500e3 instead of 50e3, then we have another numeric problem as the variance becomes 0.7 instead of 1e-4. Another algorithm becomes necessary, like the shifted algorithm where variance is computed by subtracting from each value the value of the first data point. I have a proposal if that's of interest to anyone.
This is an example from Golub's paper:
Output: