Closed mohsensadr closed 2 months ago
In the test case, I sample half of the particles from 2d normal distribution. The outcome moments compared to analytical values are:
For first half of particles in d=0:
Exact central Moment E[x] or E[(x-mu)^p] | Computed from samples | Error
1.0000000000e-01 1.0044989988e-01 4.4989987718e-04
6.2500000000e-02 6.2694519766e-02 1.9451976633e-04
0.0000000000e+00 7.1694762114e-05 7.1694762114e-05
1.1718750000e-02 1.1695532673e-02 2.3217327030e-05
0.0000000000e+00 6.4786065986e-05 6.4786065986e-05
3.6621093750e-03 3.6128887956e-03 4.9220579378e-05
For first half of particles in d=1:
Exact central Moment E[x] or E[(x-mu)^p] | Computed from samples | Error
-2.0000000000e-01 -2.0092720226e-01 9.2720226295e-04
2.5000000000e-01 2.4966389296e-01 3.3610703591e-04
0.0000000000e+00 1.1489533356e-03 1.1489533356e-03
1.8750000000e-01 1.8654392835e-01 9.5607165344e-04
0.0000000000e+00 1.2885425636e-03 1.2885425636e-03
2.3437500000e-01 2.3457289354e-01 1.9789354144e-04
For the second half of particles in d=0:
Exact central Moment E[x] or E[(x-mu)^p] | Computed from samples | Error
-1.0000000000e-01 -9.8145733950e-02 1.8542660496e-03
2.5000000000e-01 2.5057027928e-01 5.7027928086e-04
0.0000000000e+00 -7.5747547624e-04 7.5747547624e-04
1.8750000000e-01 1.8923814325e-01 1.7381432494e-03
0.0000000000e+00 -9.1589687425e-04 9.1589687425e-04
2.3437500000e-01 2.3875608926e-01 4.3810892619e-03
For the second half of particles in d=1: Exact central Moment E[x] or E[(x-mu)^p] | Computed from samples | Error
2.5000000000e-01 2.5036662260e-01 3.6662259941e-04
1.0000000000e-02 1.0006965967e-02 6.9659665872e-06
0.0000000000e+00 4.6261089171e-06 4.6261089171e-06
3.0000000000e-04 3.0180975539e-04 1.8097553868e-06
0.0000000000e+00 -3.5247159871e-07 3.5247159871e-07
1.5000000000e-05 1.5074116492e-05 7.4116492283e-08
Can you indicate if the new functionality is working well on multiple ranks and on GPU's?
Yes. I updated the test case. The code gives consistent solution for runs on 8 GPUs as well as 1-2 CPU nodes (each node with 44 cores).
This allows generating samples of a distribution only for a specific index range (range policy) of the input view.