Provide parallel and coordinate-free implementations of the variogram functions

adamreichold commented 3 years ago

Index-free representations brought speed-ups but below 10%, trivially parallel variants of those algorithms yield speed-ups above 80% on laptop-class hardware:

variogram structured    time:   [17.088 ms 17.141 ms 17.198 ms]                                
                        change: [-87.991% -87.919% -87.854%] (p = 0.00 < 0.05)
                        Performance has improved.

variogram masked structured                                                                           
                        time:   [64.209 ms 64.758 ms 65.287 ms]
                        change: [-83.828% -83.663% -83.488%] (p = 0.00 < 0.05)
                        Performance has improved.

variogram unstructured
                        time:   [156.68 ms 157.41 ms 158.25 ms]                                 
                        change: [-80.737% -80.516% -80.243%] (p = 0.00 < 0.05)
                        Performance has improved.

adamreichold commented 3 years ago

I did provide a parallel variant of the inner loops of variogram_unstructured in commit adcaf5be66ec9156f6111eed738ea4668393d06d which I then reverted as this did not improve performance. However, I kept the revert in there for looking this up later if necessary.

Furthermore, I did remove the parallelisation of the estimator methods as this did not improve performance (somethings small improvements, sometimes small slowdowns).

Finally, I changed the benchmark for variogram_ma_structured to use a random mask as the numbers were inconsistent otherwise. (Much too small compared to variogram_structured and liable to sudden change due to tiny code changes). I suspect the compiler to fold a lot of the constants in there even though I do not understand why it would affect only this benchmark.

adamreichold commented 3 years ago

Added a coordinate-free implementation of variogram_directional as well which after parallelisation yields a 75% improvement

variogram directional   time:   [227.70 ms 229.96 ms 232.36 ms]                                
                        change: [-75.354% -75.069% -74.791%] (p = 0.00 < 0.05)
                        Performance has improved.

adamreichold commented 3 years ago

I think I should comment on this code. It's probably the worst Rust code you've ever seen ;-) .

Personally, I am not in the code judge business. It seem to be fit for purpose as you describe. Furthermore, I am rather glad that there are scientific software packages picking up Rust and it seems a positive thing that code can be ported with reasonable effort keeping existing idioms intact.

As you might have guessed, this was what I wanted to work on next. Which is why I already prepared the benchmarks. But thank you again for your effort and your implementations!

I am sorry if I overstepped. I just had some quiet minutes during vacation and this sort of rewrite soothing. It might all be in vain in any case if the f.dim().0 versus dim issue makes this way of slicing things inapplicable.

Have you consciously decided to use new random masks for each benchmark?

No, that was just the minimal change the avoided the inconsistencies. We could fix the seed of the PRNG to get reproducible inputs? Though it did not seem to influence the measurements significantly. I guess we are not at the level of optimization where the contents of the data (in contrast to its shape) significantly alter the performance.

adamreichold commented 3 years ago

We could fix the seed of the PRNG to get reproducible inputs?

Amended that to the benchmarking commit.

LSchueler commented 3 years ago

I think I should comment on this code. It's probably the worst Rust code you've ever seen ;-) .

Personally, I am not in the code judge business. It seem to be fit for purpose as you describe. Furthermore, I am rather glad that there are scientific software packages picking up Rust and it seems is positive thing that code can be ported with reasonable effort keeping existing idioms intact.

That is definitely a nice way of seeing things.

As you might have guessed, this was what I wanted to work on next. Which is why I already prepared the benchmarks. But thank you again for your effort and your implementations!

I am sorry if I overstepped. I just had some quiet minutes during vacation and this sort of rewrite soothing. It might all be in vain in any case if the f.dim().0 versus dim issue makes this way of slicing things inapplicable.

No no no! As I already told you, I am really busy at the moment and it would be a terrible shame, if this project gets stuck. I am honestly very thankful for your contributions. It's great that you simply picked up where I had to stop (for quite some time now). I'll cross my fingers, that we will not run into that trouble. But I should really not have a look at that issue this evening.

Have you consciously decided to use new random masks for each benchmark?

No, that was just the minimal change the avoided the inconsistencies. We could fix the seed of the PRNG to get reproducible inputs? Though it did not seem to influence the measurements significantly. I guess we are not at the level of optimization where the contents of the data (in contrast to its shape) significantly alter the performance.

I actually don't have a very strong opinion on that. The criterion benchmarks are statistical benchmarks anyway, so a new seed for each bench run should be fine. But on the other hand, you never know?!

GeoStat-Framework / GSTools-Core

Provide parallel and coordinate-free implementations of the variogram functions #7