Closed adamreichold closed 3 years ago
By the way, concerning
The updated algorithms are MUCH slower than the previous version, if run sequentially.
I could not reproduce this, i.e. the krige benchmarks seem virtually unchanged between 27bc2684b20e29d499af542d8ab79a5f4ea0a28e and 425bceebc17a0c9be096a335ba5b2d205dae9fc4 (with the par_for_each
replaced by for_each
) when building using rustc 1.57.0-nightly 2021-09-24.
Is the comparison you referred to or did you mean something else?
By the way, concerning
The updated algorithms are MUCH slower than the previous version, if run sequentially.
I could not reproduce this, i.e. the krige benchmarks seem virtually unchanged between 27bc268 and 425bcee (with the
par_for_each
replaced byfor_each
) when building using rustc 1.57.0-nightly 2021-09-24.Is the comparison you referred to or did you mean something else?
I ran the benchmarks with the different versions a few times. Most of the times the ready-to-be-parallelized-version is about 25% to 35% slower, but every once in a while I see something like this:
krige error time: [19.291 us 19.490 us 19.734 us]
change: [+8634.8% +8722.5% +8818.4%] (p = 0.00 < 0.05)
Performance has regressed.
krige time: [17.393 us 17.452 us 17.507 us]
change: [+11192% +11283% +11362%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
I probably shouldn't take that too seriously.
* Use `dot` instead of manually zipping arrays in `field` and `krige` modules
I actually tried to calculate the dot product by using a
reduce
orfold
function, but did not succeed... and then I guess I once again forgot that I'm actually using a high-level language ;-) Thanks!* In benchmark `main`, do not allocate an array of arrays only to flatten it back again
And thanks for that! I sometimes find it hard to judge when a problem is better to be solved without using functional programming.
I ran the benchmarks with the different versions a few times. Most of the times the ready-to-be-parallelized-version is about 25% to 35% slower, but every once in a while I see something like this:
krige error time: [19.291 us 19.490 us 19.734 us] change: [+8634.8% +8722.5% +8818.4%] (p = 0.00 < 0.05) Performance has regressed. krige time: [17.393 us 17.452 us 17.507 us] change: [+11192% +11283% +11362%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%)
I probably shouldn't take that too seriously.
I think you should, but these numbers do not look like they are the result of exchanging direct indexing with zipping. Meaning that I would expect this to happen to the indexing-based code as well.
This rather looks like the measurements were distorted by background activity on your system. It depends on what operating system you use, but you should generally look out for background services periodically consuming a large amounts of resources. On notebooks, thermal throttling is also a significant issue.
Ideally, one would also choose a fixed-frequency CPU governor and disable the CPU's built-in frequency. But how to do this depends on the operating system and the CPU vendor.
I actually tried to calculate the dot product by using a
reduce
orfold
function, but did not succeed...
Just as an example if you need this elsewhere, using fold
, the phase computation written as
let mut phase = 0.0;
Zip::from(sample).and(pos).for_each(|&s, &p| {
phase += s * p;
});
should look something like
let phase = Zip::from(sample).and(pos).fold(0.0, |acc, &s, &p| {
acc + s * p
});
using a fold.
P.S.: I also tried parallelizing the second loop level in the krige functions as they map pretty nicely to fold
and reduce
, but this always resulted in slowdowns compared to the serial version, at least for the data sizes considered by the microbenchmarks.
dot
instead of manually zipping arrays infield
andkrige
modulesmain
, do not allocate an array of arrays only to flatten it back again