NOAA-OWP / ngen

Next Generation Water Modeling Engine and Framework Prototype
Other
82 stars 57 forks source link

Gridded Data Selectors #799

Open hellkite500 opened 2 months ago

hellkite500 commented 2 months ago

Following the intergration of the Gridded Forcing Engine (#705), implement gridded data selectors for DataProviders to allow connecting gridded formulations to gridded input data.

program-- commented 2 months ago

Some notes:

Initial Design

Currently, I'm envisioning a gridded data selector as a struct holding cell indices, where Cell is defined as a simple POD struct:

struct Cell {
    std::uint64_t x, y, z;
    double value;
}; // 32 bytes

Then, for selection, we'd just need a GridSpecification:

struct Extent {
    double xmin, xmax, ymin, ymax;
};

struct GridSpecification {
  //! Total number of rows (aka y)
  std::uint64_t rows;

  //! Total number of columns (aka x)
  std::uint64_t columns;

  //! Extent of the grid region (aka min-max corner points)
  Extent extent;
};

Using a GridDataSelector, we then define constructors that take in a grid specification and the type to extract against. For example, a point selector may have a constructor like:

struct SelectorConfig {
  //! Initial time for query.
  //! @todo Refactor to use std::chrono
  time_t init_time;

  //! Duration for query, in seconds.
  //! @todo Refactor to use std::chrono
  long duration_seconds;

  //! Variable to return from query.
  std::string variable_name;

  //! Units for output variable.
  std::string variable_units;
};

struct GridDataSelector;

GridDataSelector::GridDataSelector(
    SelectorConfig config,
    const GridSpecification& grid,
    boost::span<const geojson::coordinate_t> points
);

Where the implementation simply encodes the point coordinates against the grid specification. I'll probably assume planar coordinates, as that simplifies the implementation and the numerical error will most likely be negligible.

Encoding Coordinates

Coordinates can be encoded to a grid axis by taking the size of that axis (upper_bound), computing the spacing (diff), and flooring the relative position. For example, a simple generic implementation is:

std::uint64_t coordinate_to_axis_index(
    double position,
    double min,
    double max,
    std::uint64_t upper_bound
) {
    const double diff = static_cast<double>(upper_bound) / (max - min);

    // Coordinate position is within or on the grid axis bounds
    if (position >= min && position <= max) {
      return std::floor((position - min) * diff);
    }

    // Coordinate position is out of bounds, so
    // return a sentinel or throw an exception
    return static_cast<std::uint64_t>(-1);
}

Providers

Unlike the current data providers, a gridded data provider may look like:

class SomeGridDataProvider : public data_access::DataProvider<Cell, GridDataSelector>
{ ... };

The main difference is in the base class. This will imply the return value

SomeGridDataProvider::get_values(...) -> std::vector<Cell>

So, a GridDataSelector can hold a std::vector<Cell>, populate it with selected cells, pass it to the provider, the provider can fill the Cell::value members, and return the resulting vector of Cells to the consumer. Alternatively, if we want to implement resampling/regridding within this, a new vector corresponding to the target grid could be returned instead.

PhilMiller commented 2 months ago

Just to make sure I don't confuse things with irrelevant comments, this issue's notes are all aimed at models like Noah-OWP-Modular, which will run over (a masked subset of) a regular grid, right? Unstructured mesh nodes/elements will be addressed separately?

program-- commented 2 months ago

@PhilMiller For the first pass of gridded data selectors, yes, I'm just tackling (uniform) structured grids.