Closed adamreichold closed 3 years ago
I did provide a parallel variant of the inner loops of variogram_unstructured
in commit adcaf5be66ec9156f6111eed738ea4668393d06d which I then reverted as this did not improve performance. However, I kept the revert in there for looking this up later if necessary.
Furthermore, I did remove the parallelisation of the estimator methods as this did not improve performance (somethings small improvements, sometimes small slowdowns).
Finally, I changed the benchmark for variogram_ma_structured
to use a random mask as the numbers were inconsistent otherwise. (Much too small compared to variogram_structured
and liable to sudden change due to tiny code changes). I suspect the compiler to fold a lot of the constants in there even though I do not understand why it would affect only this benchmark.
Added a coordinate-free implementation of variogram_directional
as well which after parallelisation yields a 75% improvement
variogram directional time: [227.70 ms 229.96 ms 232.36 ms]
change: [-75.354% -75.069% -74.791%] (p = 0.00 < 0.05)
Performance has improved.
I think I should comment on this code. It's probably the worst Rust code you've ever seen ;-) .
Personally, I am not in the code judge business. It seem to be fit for purpose as you describe. Furthermore, I am rather glad that there are scientific software packages picking up Rust and it seems a positive thing that code can be ported with reasonable effort keeping existing idioms intact.
As you might have guessed, this was what I wanted to work on next. Which is why I already prepared the benchmarks. But thank you again for your effort and your implementations!
I am sorry if I overstepped. I just had some quiet minutes during vacation and this sort of rewrite soothing. It might all be in vain in any case if the f.dim().0
versus dim
issue makes this way of slicing things inapplicable.
Have you consciously decided to use new random masks for each benchmark?
No, that was just the minimal change the avoided the inconsistencies. We could fix the seed of the PRNG to get reproducible inputs? Though it did not seem to influence the measurements significantly. I guess we are not at the level of optimization where the contents of the data (in contrast to its shape) significantly alter the performance.
We could fix the seed of the PRNG to get reproducible inputs?
Amended that to the benchmarking commit.
I think I should comment on this code. It's probably the worst Rust code you've ever seen ;-) .
Personally, I am not in the code judge business. It seem to be fit for purpose as you describe. Furthermore, I am rather glad that there are scientific software packages picking up Rust and it seems is positive thing that code can be ported with reasonable effort keeping existing idioms intact.
That is definitely a nice way of seeing things.
As you might have guessed, this was what I wanted to work on next. Which is why I already prepared the benchmarks. But thank you again for your effort and your implementations!
I am sorry if I overstepped. I just had some quiet minutes during vacation and this sort of rewrite soothing. It might all be in vain in any case if the
f.dim().0
versusdim
issue makes this way of slicing things inapplicable.No no no! As I already told you, I am really busy at the moment and it would be a terrible shame, if this project gets stuck. I am honestly very thankful for your contributions. It's great that you simply picked up where I had to stop (for quite some time now). I'll cross my fingers, that we will not run into that trouble. But I should really not have a look at that issue this evening.
Have you consciously decided to use new random masks for each benchmark?
No, that was just the minimal change the avoided the inconsistencies. We could fix the seed of the PRNG to get reproducible inputs? Though it did not seem to influence the measurements significantly. I guess we are not at the level of optimization where the contents of the data (in contrast to its shape) significantly alter the performance.
I actually don't have a very strong opinion on that. The criterion benchmarks are statistical benchmarks anyway, so a new seed for each bench run should be fine. But on the other hand, you never know?!
Index-free representations brought speed-ups but below 10%, trivially parallel variants of those algorithms yield speed-ups above 80% on laptop-class hardware: