bab2min / EigenRand

Fastest Random Distribution Generator for Eigen
https://bab2min.github.io/eigenrand/
MIT License
92 stars 13 forks source link

Performance comparison with a written eigenmvn #37

Closed JohnPekl closed 2 years ago

JohnPekl commented 2 years ago

Thanks for this useful random module.

If I want to sample from Multivariate Normal Distribution, Could you please tell me the performance difference between EigenRand and this Github module, eigenmvn? Which one is faster or does it have the same sampling speed?

In EigenRand, I can do

Rand::P8_mt19937_64 urng{ 42 };
Rand::MvNormalGen<double, 4> gen_init{ mu, cov };
Matrix<double, 4, -1> samples = gen_init.generate(urng, 10);

In eigenmvn, I can do

EigenMultivariateNormal<double> norm_init(mu, cov);
norm_init.samples(10);
bab2min commented 2 years ago

Hi @JohnPekl , Unlike eigenmvn, EigenRand uses SIMD acceleration for random number generations including uniform random, normal random, and so on. So it expected to be 2~4 times faster depending on the SIMD instructions supported by CPU. The speed difference can be measured by writing a simple comparison code as shown below:

#include <chrono>
#include <Eigen/Dense>
#include <EigenRand/EigenRand>
#include "eigenmvn.h"

int main()
{
    Eigen::Vector4d mu{ 1, 2, 3, 4 };
    Eigen::Matrix4d cov{ 
        {1, 1, 0, 0},
        {0, 2, 0, 1},
        {0, 0, 1, 0},
        {2, 0, 0, 1},
    };

    Eigen::Matrix<double, 4, -1> samples;

    Eigen::Rand::P8_mt19937_64 urng{ 42 };
    Eigen::Rand::MvNormalGen<double, 4> gen_init{ mu, cov };
    Eigen::EigenMultivariateNormal<double> norm_init{ mu, cov };

    const int repeat = 10000;
    for (int size : {10, 50, 100, 500, 1000})
    {
        samples.resize(4, size);
        double eigen_rand = 0, eigen_mvn = 0;
        for (int i = 0; i < 100; ++i)
        {
            {
                auto start = std::chrono::high_resolution_clock::now();
                samples = gen_init.generate(urng, size);
                eigen_rand += (std::chrono::high_resolution_clock::now() - start).count() / 1e+3;
            }
            {
                auto start = std::chrono::high_resolution_clock::now();
                samples = norm_init.samples(size);
                eigen_mvn += (std::chrono::high_resolution_clock::now() - start).count() / 1e+3;
            }
        }
        printf("Generating 4x%d\n- EigenRand: %.4g ms\n- EigenMVN: %.4g ms\n\n", size, eigen_rand / repeat, eigen_mvn / repeat);
    }
    return 0;
}

When compiled with AVX2 option enabled on AMD Ryzen 3700x, the result is as below.

Generating 4x10
- EigenRand: 0.00551 ms
- EigenMVN: 0.02118 ms

Generating 4x50
- EigenRand: 0.01446 ms
- EigenMVN: 0.08223 ms

Generating 4x100
- EigenRand: 0.02495 ms
- EigenMVN: 0.1615 ms

Generating 4x500
- EigenRand: 0.2728 ms
- EigenMVN: 1.035 ms

Generating 4x1000
- EigenRand: 0.4599 ms
- EigenMVN: 1.91 ms
JohnPekl commented 2 years ago

Thanks, @bab2min for your detailed answer.