ecmwf / atlas

A library for numerical weather prediction and climate modelling
https://sites.ecmwf.int/docs/atlas
Apache License 2.0
115 stars 42 forks source link

`Field.levels()` is expensive #170

Closed mo-joshuacolclough closed 9 months ago

mo-joshuacolclough commented 9 months ago

Is your feature request related to a problem? Please describe.

It has been noticed that looping through fields using field.levels(), e.g:

for (atlas::idx_t horizontal = 0; horizontal < someFieldView.shape(0); ++horizontal) {
  for (atlas::idx_t vertical = 0 ; vertical < someField.levels(); ++ vertical) {
    // do something...

is significantly (~50-100x !) slower than first retrieving the levels from the field:

const atlas::idx_t fieldLevels = someField.levels()
for (atlas::idx_t horizontal = 0; horizontal < someFieldView.shape(0); ++horizontal) {
  for (atlas::idx_t vertical = 0 ; vertical < fieldLevels; ++ vertical) {
    // ...

Alternatively, using someFieldView.shape(1) as a proxy for the number of levels is also significantly faster.

Looking at the Field.levels() function, it seems that the levels is retrieved from a Metadata object which I imagine is expensive. Is there a reason it is done this way?

Describe the solution you'd like

If there's not a strong reason for having the levels() pull a value from a Metadata object, could this be a member varaible in Field?

Describe alternatives you've considered

If it's not possible to have it as a member variable, it should be documented so that other users of Atlas are aware of the potentially unwanted behaviour.

Additional context

No response

Organisation

Met Office

odlomax commented 9 months ago

I think the difficulty with having a member variable is that you now have two unsynchronised values which represent levels, i.e. metadata.getInt("levels") and an int in the field object.

Perhaps the performant yet readable solution is to have metadata which defines the horizontal and vertical dimensions of the field, i.e.,

const auto horizDim = metadata.getInt("horizontal dimension", 0);
const auto vertDim = metadata.getInt("vertical dimension", 1);

// could implement `Field::horizontalDim()` and `Field::verticalDim()` to simplify

for (auto horiz = idx_t{0}; horiz < fieldView.shape(horizDim); ++horiz) {
  for (auto vert = idx_t{0}; vert < fieldView.shape(vertDim); ++vert) {
    // some stuff with fieldView(horiz, vert)...
  }
}

@wdeconinck, I think this is something that was discussed last year in Boulder, right?

wdeconinck commented 9 months ago

The idea is that the Field only holds an Array, Metadata and a reference to the FunctionSpace. Anything like field.levels() is just a shorthand for getting it out of the Metadata via field.metadata().getInt("levels") . It should indeed be noted that this is not a cheap operation and that you may instead like to: const idx_t levels = field.levels(); and use levels inside the computational loop, or you can use the array-view's shape function. Even if it was stored as an extra variable, Field contains a FieldImpl* pointer and such indirection will also not be good for performance. It is also possibly better to extract loop bounds before the computational loop for compilers to have a better chance at optimising.

@odlomax , w.r.t. horizontal indices, this is indeed also in the metadata, and there's a short-hand to get it out:

std::vector<idx_t> Field::horizontal_dimensions() const {
        std::vector<idx_t> h_dim{0};
        metadata().get("horizontal_dimension", h_dim);
        return h_dim;
}

Note that there could be more than 1 horizontal dimension.

mo-joshuacolclough commented 9 months ago

Thank you for the comprehensive response @wdeconinck.

Even if it was stored as an extra variable, Field contains a FieldImpl* pointer and such indirection will also not be good for performance.

I hadn't considered this.. In general then it is best to pull this information out of a field prior to looping. I'll close this issue for now as it's clear that this was the intended design.