ecmwf / atlas

A library for numerical weather prediction and climate modelling
https://sites.ecmwf.int/docs/atlas
Apache License 2.0
107 stars 41 forks source link

Optimise array::helpers::ForEach for small arrays. #133

Closed odlomax closed 1 year ago

odlomax commented 1 year ago

Hi @wdeconinck,

I've found that the array ForEach method is particularly slow when processing small arrays (tested on N = 100). I believe this is because processing execution_policy configs has a relatively high amount overhead.

I've done a couple of things to alleviate this: 1) allowed constexpr policy selection when you provide an execution_policy object. 2) implemented perfect forwarding on the ArrayView objects (I don't think this did much, but it was useful to learn how to do it!)

I've added two performance tests : 1) nested performs a ForEach over the 50000 columns and another ForEach over the 100 levels. Execution policy selection is constexpr. 2) nested, config. Does the same as above, but determines the policy via a config.

Elapsed time: Addition; raw pointer               = 0.0091302s
Elapsed time: Addition; for loop (i, j)           = 0.00921452s ;   relative to baseline : 100.924%
Elapsed time: Addition; for loop (j, i)           = 0.0478493s  ;   relative to baseline : 524.077%
Elapsed time: Addition; for each (columns)        = 0.00907069s ;   relative to baseline : 99.3483%
Elapsed time: Addition; for each (levels)         = 0.0481194s  ;   relative to baseline : 527.036%
Elapsed time: Addition; for each (all elements)   = 0.00914668s ;   relative to baseline : 100.181%
Elapsed time: Addition; for each (nested)         = 0.00920637s ;   relative to baseline : 100.834%
Elapsed time: Addition; for each (nested, config) = 0.118888s   ;   relative to baseline : 1302.14%

Elapsed time: Trig    ; raw pointer               = 0.312925s
Elapsed time: Trig    ; for loop (i, j)           = 0.319474s   ;   relative to baseline : 102.093%
Elapsed time: Trig    ; for loop (j, i)           = 0.502402s   ;   relative to baseline : 160.55%
Elapsed time: Trig    ; for each (columns)        = 0.316493s   ;   relative to baseline : 101.14%
Elapsed time: Trig    ; for each (levels)         = 0.435909s   ;   relative to baseline : 139.301%
Elapsed time: Trig    ; for each (all elements)   = 0.315607s   ;   relative to baseline : 100.857%
Elapsed time: Trig    ; for each (nested)         = 0.318103s   ;   relative to baseline : 101.655%
Elapsed time: Trig    ; for each (nested, config) = 0.431594s   ;   relative to baseline : 137.923%

As you can see, the benefit of using a constexpr execution policy is quite dramatic when performing a large number calls on small array sizes.

Thankfully, the config overhead is negligible when dealing with normal-sized NWP fields.

The only (intended) change to the interface is that you can now supply ArrayViews to the ForEach::Apply method using std::tie and std::forward_as_tuple as well as std::make_tuple.

closes #134

odlomax commented 1 year ago

Nice improvements! I like that you don't have the arrayViewsCopy anymore.

Thanks, Willem. Your suggestions make the code a lot cleaner (and hopefully more readable). I think I know how forwarding works now!

wdeconinck commented 1 year ago

FYI, before you go deeper, I have already finished working on a field::for_each API. I was just about to create a PR, but then you dropped this. I'll have to reintegrate.

odlomax commented 1 year ago

to create

Whoops, I hope I didn't create much more work. I think I've gone more than deep enough!