ecmwf / atlas

A library for numerical weather prediction and climate modelling
https://sites.ecmwf.int/docs/atlas
Apache License 2.0
115 stars 42 forks source link

Implement field::for_each capabilities #139

Closed wdeconinck closed 1 year ago

wdeconinck commented 1 year ago

Develop field::for_each functionality. For instance:

Field f1, f2, f3;
// ... assign f1, f2 ...
field::for_each_value(f1, f2, f3, [](const double& x1, const double& x2, double& x3) {
    x3 = x1 + x2;
});

We can also loop over Field::horizontal_dimension() only, hence extracting column-views. Note that Field::horizontal_dimension() returns a std::vector<idx_t> to cater for multiple horizontal dimensions in a field which don't need to be following eachother (e.g. BlockStructuredColumns fields of dimensions {nblock,nlev,nproma} where nblock and nproma are the horizontal dimensions).

Field f1, f2, f3;
// ... assign f1, f2 ...
field::for_each_column(f1, f2, f3, [](array::View<const double,1>&& x1,
                                      array::View<const double,1>&& x2,
                                      array::View<double,1>&& x3
    const idx_t nlev = x1.size();
    for (idx_t jlev=0; jlev<nlev; ++jlev) {
        x3(jlev) = x1(jlev) + x2(jlev);
    }
});

In case there are extra dimensions (e.g. a "variable" dimension), then the column views could be multidimensional:

Field f1, f2, f3;
// ... assign f1, f2 ...
field::for_each_column(f1, f2, f3, [](array::View<const double,2>&& x1,
                                      array::View<const double,2>&& x2,
                                      array::View<double,2>&& x3) {
    const idx_t nlev = x1.shape(0);
    const idx_t nvar = x1.shape(1);
    for (idx_t jlev=0; jlev<nlev; ++jlev) {
        for (idx_t jvar=0; jvar<nvar; ++jvar) {
            x3(jlev,jvar) = x1(jlev,jar) + x2(jlev,jvar);
        }
    }
});

There are further functions that accept a masking function or masking field (such as a ghost field). The ghost field is typically a one-dimensional field corresponding to the horizontal_dimension. If there are multiple horizontal_dimensions, then either the ghost field can be multi-dimensional, or a one-dimensional view of the horizontal_dimensions with indexing like (ni*j+i).

Field f1, f2, f3; 
Field ghost;
// ... assign f1, f2, ghost ...
field::for_each_value_masked(ghost, f1, f2, f3, [](const double& x1, const double& x2, double& x3) {
    x3 = x1 + x2;
});
Field f1, f2, f3;
Field ghost;
// ... assign f1, f2, ghost ...
field::for_each_column_masked(ghost, f1, f2, f3, [](array::View<const double,1>&& x1,
                                                    array::View<const double,1>&& x2,
                                                    array::View<double,1>&& x3
    const idx_t nlev = x1.size();
    for (idx_t jlev=0; jlev<nlev; ++jlev) {
        x3(jlev) = x1(jlev) + x2(jlev);
    }
});

If a masking function is supplied instead of a ghost field, and only the first field dimension is to be masked, then define it as:

auto mask = [](idx_t i, auto&&... args) {
    // mask depending on i only.
};
wdeconinck commented 1 year ago

Please review @odlomax @ytremolet

codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 88.80% and project coverage change: +0.04 :tada:

Comparison is base (9a2def8) 78.18% compared to head (a61d667) 78.22%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #139 +/- ## =========================================== + Coverage 78.18% 78.22% +0.04% =========================================== Files 817 802 -15 Lines 59325 59645 +320 =========================================== + Hits 46385 46660 +275 - Misses 12940 12985 +45 ``` | [Impacted Files](https://app.codecov.io/gh/ecmwf/atlas/pull/139?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ecmwf) | Coverage Δ | | |---|---|---| | [src/atlas/field/detail/for\_each.h](https://app.codecov.io/gh/ecmwf/atlas/pull/139?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ecmwf#diff-c3JjL2F0bGFzL2ZpZWxkL2RldGFpbC9mb3JfZWFjaC5o) | `53.08% <53.08%> (ø)` | | | [src/atlas/field/for\_each.h](https://app.codecov.io/gh/ecmwf/atlas/pull/139?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ecmwf#diff-c3JjL2F0bGFzL2ZpZWxkL2Zvcl9lYWNoLmg=) | `79.51% <79.51%> (ø)` | | | [src/tests/field/test\_field\_foreach.cc](https://app.codecov.io/gh/ecmwf/atlas/pull/139?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ecmwf#diff-c3JjL3Rlc3RzL2ZpZWxkL3Rlc3RfZmllbGRfZm9yZWFjaC5jYw==) | `99.69% <99.69%> (ø)` | | | [src/atlas/array/helpers/ArrayForEach.h](https://app.codecov.io/gh/ecmwf/atlas/pull/139?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ecmwf#diff-c3JjL2F0bGFzL2FycmF5L2hlbHBlcnMvQXJyYXlGb3JFYWNoLmg=) | `96.29% <100.00%> (+3.11%)` | :arrow_up: | ... and [155 files with indirect coverage changes](https://app.codecov.io/gh/ecmwf/atlas/pull/139/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ecmwf)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

DJDavies2 commented 1 year ago

I am getting build failures when building against oops (JCSDA) with this:

In file included from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/ArrayView.h:18:0, from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/functionspace/CubedSphereColumns.h:13, from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/functionspace.h:16, from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/base/GeometryData.h:15, from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/base/Geometry.h:23, from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/assimilation/ControlVariable.h:20, from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/assimilation/ControlIncrement.h:21, from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/assimilation/MinimizerUtils.h:16, from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/assimilation/MinimizerUtils.cc:8: /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:66: error: 'is_convertible_v' is not a member of 'std' template <typename ValueTp, typename = std::enable_if_t<std::is_convertible_v<ValueTp,value_type>>> ^~~~ /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:66: note: suggested alternative: 'is_convertible' template <typename ValueTp, typename = std::enable_if_t<std::is_convertible_v<ValueTp,value_type>>> ^~~~ is_convertible /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:66: error: 'is_convertible_v' is not a member of 'std' /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:66: note: suggested alternative: 'is_convertible' template <typename ValueTp, typename = std::enable_if_t<std::is_convertible_v<ValueTp,value_type>>> ^~~~ is_convertible /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:102: error: template argument 1 is invalid template <typename ValueTp, typename = std::enable_if_t<std::is_convertible_v<ValueTp,value_type>>> ^ /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:105: error: expected unqualified-id before '>' token template <typename ValueTp, typename = std::enable_if_t<std::is_convertible_v<ValueTp,value_type>>> ^ gmake[2]: [oops/src/CMakeFiles/oops.dir/oops/generic/UnstructuredInterpolator.cc.o] Error 1 gmake[2]: [oops/src/CMakeFiles/oops.dir/oops/assimilation/MinimizerUtils.cc.o] Error 1

wdeconinck commented 1 year ago

@DJDavies2 Thanks for verifying compilation. You have observed this in multiple PRs now, so could you see if it is triggered in develop branch as well? If so create a GitHub issue specifically for the develop branch. Please then mention the compiler, version, and compile line including include directories and flags.

I have added to develop branch the public propagation of the -std=c++17 flag to downstream projects, as I believe that's what is missing. A reintegration of this branch on develop should then hopefully fix this.

wdeconinck commented 1 year ago

@ytremolet when you have time could you comment? Otherwise I'll just merge this in soon.

DJDavies2 commented 1 year ago

@DJDavies2 Thanks for verifying compilation. You have observed this in multiple PRs now, so could you see if it is triggered in develop branch as well? If so create a GitHub issue specifically for the develop branch. Please then mention the compiler, version, and compile line including include directories and flags.

I have added to develop branch the public propagation of the -std=c++17 flag to downstream projects, as I believe that's what is missing. A reintegration of this branch on develop should then hopefully fix this.

I have tested against head of develop and it seems to work now, there are been changes updating to std=c++17 in some downstream repositories so that has probably fixed this issue.

wdeconinck commented 1 year ago

I have added performance tests with OMP_NUM_THREADS=4 with following output:

timing with for loops seq  = 0.00333232  (triple nested for-loop, sequential)
timing with execution::seq = 0.0790008   (field::for_each_value, sequential)
timing with for loops par  = 0.00102567  (triple nested for-loop with OpenMP for for outermost loop)
timing with execution::par = 0.0215601   (field::for_each_value, parallel)

The difference between normal for-loops and the field::for_each is pretty dramatic (field::for_each is factor 20-24 slower)! I don't know why with the ArrayForEach tests we had no difference in result, but we do have a lot of difference here.

This should foremost not stop using this capability. The benefit is that optimisations can and WILL be applied in following developments.

DJDavies2 commented 1 year ago

The new test fails when I run it:

Running case 2: test field::for_each_value_masked; horizontal_dimension {0,2} mask_rank1 ... [0] Test "test field::for_each_value_masked; horizontal_dimension {0,2} mask_rank1" failed with unhandled eckit::Exception: Assertion failed: (std::is_invocable_v<Mask,MaskArgs...>) in apply, line 265 of /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/helpers/ArrayForEach.h @  [0] Stack trace: backtrace [1] stack has 38 addresses [0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/lib/libeckit.so+eckit::BackTrace::dump[abi:cxx11]())0x91 [0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/lib/libeckit.so+eckit::Exception::Exception())0xb6 [0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/lib/libeckit.so+eckit::AssertionFailed::AssertionFailed(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, eckit::CodeLocation const&))0x41 [0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/lib/libeckit.so+eckit::handle_assert(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, eckit::CodeLocation const&))0x2a5 [0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/lib/libatlas.so.0.33+atlas::throw_AssertionFailed(std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, eckit::CodeLocation const&))0x53 [0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/atlas/src/tests/field/atlas_test_field_foreach+atlas::detail::Assert(bool, char const, char const, int, char const*))0xa1

and similar for rank2.

wdeconinck commented 1 year ago

@DJDavies2 Is this with gnu 7.3? Which build-type? (You could print output of atlas --info)

This failure should not have anything to do with the new test. Not sure how this occurs. It seems to pass on so many other compilers and build-types, including gnu 7.5.

@odlomax could you perhaps try to investigate or debug this on this particular configuration?

DJDavies2 commented 1 year ago

Yes, this is gnu 7.3.

odlomax commented 1 year ago

@DJDavies2 Is this with gnu 7.3? Which build-type? (You could print output of atlas --info)

This failure should not have anything to do with the new test. Not sure how this occurs. It seems to pass on so many other compilers and build-types, including gnu 7.5.

@odlomax could you perhaps try to investigate or debug this on this particular configuration?

I'm just having a look. Will get back to you later today.

odlomax commented 1 year ago

@DJDavies2 Is this with gnu 7.3? Which build-type? (You could print output of atlas --info) This failure should not have anything to do with the new test. Not sure how this occurs. It seems to pass on so many other compilers and build-types, including gnu 7.5. @odlomax could you perhaps try to investigate or debug this on this particular configuration?

I'm just having a look. Will get back to you later today.

I might have found a bug. If I can confirm it, shall I PR a solution into this branch?

odlomax commented 1 year ago

I think I've fixed it https://github.com/ecmwf/atlas/pull/145

tl;dr: GCC 7.3 is old and rubbish.

wdeconinck commented 1 year ago

With new changes from d97f973 the field::for_each_value timings did not seem to improve:

timing with for loops seq  = 0.0035155
timing with for loops par  = 0.00106079
timing with execution::seq = 0.092391
timing with execution::par = 0.0240693
odlomax commented 1 year ago

With new changes from d97f973 the field::for_each_value timings did not seem to improve:

timing with for loops seq  = 0.0035155
timing with for loops par  = 0.00106079
timing with execution::seq = 0.092391
timing with execution::par = 0.0240693

Strange. I got the following when I ran the test in release:

37: timing with for loops seq  = 0.000203149
37: timing with for loops par  = 5.73367e-05
37: timing with execution::seq = 0.000196719
37: timing with execution::par = 5.94674e-05

Admittedly, there's a fair bit of variation between runs.

wdeconinck commented 1 year ago

Apologies @odlomax I indeed do recover the raw for loop performance when I have optimisations on. I had hacked my compile flags in my build and forgot to reset them.

wdeconinck commented 1 year ago

@DJDavies2 Please let me know if you still have issues with updated branch, and thanks once more for spotting this!

DJDavies2 commented 1 year ago

The test works now, thanks.

odlomax commented 1 year ago

Great success!

wdeconinck commented 1 year ago

Finally merged. Thanks all your help!