GeoStat-Framework / GSTools-Core

A Rust implementation of the core algorithms of GSTools.
GNU Lesser General Public License v3.0
10 stars 0 forks source link

Two miscellaneous nits #4

Closed adamreichold closed 3 years ago

adamreichold commented 3 years ago
adamreichold commented 3 years ago

By the way, concerning

The updated algorithms are MUCH slower than the previous version, if run sequentially.

I could not reproduce this, i.e. the krige benchmarks seem virtually unchanged between 27bc2684b20e29d499af542d8ab79a5f4ea0a28e and 425bceebc17a0c9be096a335ba5b2d205dae9fc4 (with the par_for_each replaced by for_each) when building using rustc 1.57.0-nightly 2021-09-24.

Is the comparison you referred to or did you mean something else?

LSchueler commented 3 years ago

By the way, concerning

The updated algorithms are MUCH slower than the previous version, if run sequentially.

I could not reproduce this, i.e. the krige benchmarks seem virtually unchanged between 27bc268 and 425bcee (with the par_for_each replaced by for_each) when building using rustc 1.57.0-nightly 2021-09-24.

Is the comparison you referred to or did you mean something else?

I ran the benchmarks with the different versions a few times. Most of the times the ready-to-be-parallelized-version is about 25% to 35% slower, but every once in a while I see something like this:

krige error             time:   [19.291 us 19.490 us 19.734 us]                         
                        change: [+8634.8% +8722.5% +8818.4%] (p = 0.00 < 0.05)
                        Performance has regressed.

krige                   time:   [17.393 us 17.452 us 17.507 us]                   
                        change: [+11192% +11283% +11362%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)

I probably shouldn't take that too seriously.

LSchueler commented 3 years ago
* Use `dot` instead of manually zipping arrays in `field` and `krige` modules

I actually tried to calculate the dot product by using a reduce or fold function, but did not succeed... and then I guess I once again forgot that I'm actually using a high-level language ;-) Thanks!

* In benchmark `main`, do not allocate an array of arrays only to flatten it back again

And thanks for that! I sometimes find it hard to judge when a problem is better to be solved without using functional programming.

adamreichold commented 3 years ago

I ran the benchmarks with the different versions a few times. Most of the times the ready-to-be-parallelized-version is about 25% to 35% slower, but every once in a while I see something like this:

krige error             time:   [19.291 us 19.490 us 19.734 us]                         
                        change: [+8634.8% +8722.5% +8818.4%] (p = 0.00 < 0.05)
                        Performance has regressed.

krige                   time:   [17.393 us 17.452 us 17.507 us]                   
                        change: [+11192% +11283% +11362%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)

I probably shouldn't take that too seriously.

I think you should, but these numbers do not look like they are the result of exchanging direct indexing with zipping. Meaning that I would expect this to happen to the indexing-based code as well.

This rather looks like the measurements were distorted by background activity on your system. It depends on what operating system you use, but you should generally look out for background services periodically consuming a large amounts of resources. On notebooks, thermal throttling is also a significant issue.

Ideally, one would also choose a fixed-frequency CPU governor and disable the CPU's built-in frequency. But how to do this depends on the operating system and the CPU vendor.

adamreichold commented 3 years ago

I actually tried to calculate the dot product by using a reduce or fold function, but did not succeed...

Just as an example if you need this elsewhere, using fold, the phase computation written as

let mut phase = 0.0;
Zip::from(sample).and(pos).for_each(|&s, &p| {
    phase += s * p;
});

should look something like

let phase = Zip::from(sample).and(pos).fold(0.0, |acc, &s, &p| {
    acc + s * p
});

using a fold.

P.S.: I also tried parallelizing the second loop level in the krige functions as they map pretty nicely to fold and reduce, but this always resulted in slowdowns compared to the serial version, at least for the data sizes considered by the microbenchmarks.