Implement API endpoints for querying data

ricardogsilva commented 5 months ago

Things that would be helpful to have:

Ask for some processing to be applied to the series, like smoothing
Request a variable but be able to get related series, like uncertainty bounds, closest observations station, different trendlines, etc.

A redesigned API would look like this:

GET /v2/api/coverages/time-series/{coverage-identifier}?
  coords={wkt}&
  datetime={datetime}&
  include_smoothed_data={smoothing_algorithm}&
  include_uncertainty=true&
  include_closest_observation_station=true&
  include_trendline_a=true&
  include_trendline_b=true&

coverage-id - already encapsulates the temporal aggregation of the data (monthly, seasonal, etc)
coords - a WKT string for a point or multipoint with the location(s) to be sampled
datetime - timerange for specifying the temporal interval of the data
smoothing_algorithm - would serve for requesting an additional time series that is derived from the original data by employing some type of smoothing algorithm, with a predetermined list of algorithms to choose from
include_uncertainty - would serve for requesting two additional time series, with the upper and lower limits of the confidence of the data - these additional time series would be generated with same smoothing algorithm as the main variable, if one is selected. Otherwise the response would show the raw uncertainty values
include_closest_observation_station - would serve for including an additional time series with values gotten from the closest observation station (within a predetermined distance threshold), for the same variable. These values would be processed with the same smoothing algorithm as the main variable, if any. Otherwise the response would show the raw observation values
include_trendline_X - would serve for requesting different types of trend lines related to the data

The response would be a flat list of measurements, with suitable labels:

[
  {
    "timeseries": "tas_mean",
    "datetime": "1982-01-01T00:00:00Z",
    "value": 5.2, 
  },
  {
    "timeseries": "tas_mean_smoothed",
    "datetime": "1982-01-01T00:00:00Z",
    "value": 5.1, 
  },
  {
    "timeseries": "tas_mean-uncertainty_upper_bound",
    "datetime": "1982-01-01T00:00:00Z",
    "value": 7.2,
  },
  {
    "timeseries": "tas_mean-uncertainty_lower_bound",
    "datetime": "1982-01-01T00:00:00Z",
    "value": 4.7,
  },
{
    "timeseries": "tas_mean-trendline_a",
    "datetime": "1982-01-01T00:00:00Z",
    "value": 5.0,
  },
{
    "timeseries": "station1",
    "datetime": "1982-01-01T00:00:00Z",
    "value": 4.9,
  }
]

Alternatively, the response may be a CoverageJSON object, but this needs a bit further investigation

ricardogsilva commented 5 months ago

Some of the additional data series, like for specific types of trendlines, would need to accept additional runtime parameters (ex: in the start year, end year, significance of values, etc.).

gmassaroarpav commented 5 months ago

Please, add also:

include_trendvalues: output values from trend functions (see Mann-Kendall function in the elaborazioni_stat.pdf document): intercept, slope, p-value. Slope is used for the trend value (eg. : °C/year), p-value for the statistical significance.
include_fixed_mean: would serve for requesting an additional time series that is derived from the original data by employing a ten-year fixed mean (see elaborazioni_stat.pdf document).

ricardogsilva commented 5 months ago

Upon further analysis of requirements, an alternative design has been chosen:

Time series for observations data

Chart data for observations data has the following requirements:

charts are plotted for each observations variable (_obsvar) (i.e.: TDd, TDx, etc )
data may be smoothed with some smoothing algorithm (ex: centered 5-year moving average)
plots may include an additional series with aggregated data for each decade in the requested temporal range
plots may include an additional time series which is result of calculating the Mann-Kendall trend, with client-provided input values (start year, end year)

With these requirements in mind, an API endpoint for requesting observation-related time series will look like this:

/api/v2/observations/{station_id}/time-series/{observation-variable-identifier}?
  datetime={datetime}&
  include_observations_data={true/false}&
  smoothing_algorithm={smoothing_algorithm}&
  include_decade_aggregation={true/false}&
  include_mann_kendall_trend={true/false}&
  mann_kendall_trend_start_year={mann_kendall_start_year}&
  mann_kendall_trend_end_year={mann_kendall_end_year}

Where:

station_id is the identifier of the station where data comes from
observation_variable_identifier is the identifier of the _obsvar
datetime is a timerange for specifying the temporal interval of the data. This is a string with the same semantics as those defined in OGC API EDR. Defaults to null, which means that the response should cover the full temporal range of data for that station
include_observations_data is a boolean specifying whether the observations data time series is to be returned or not - defaults to true
smoothing_algorithm is a string with the name of the smoothing algorithm to be applied to all observations data. If null then no smoothing is applied - defaults to null
include_decade_aggregation is a boolean specifying whether to return an additional time series with aggregated values by decade. Defaults to false
include_mann_kendall_trend - is a boolean specifying whether to return an additional time series with a trend line that is calculated using the Mann-Kendall method. Defaults to false
mann_kendall_trend_start_year - is a string with the start year for the Mann-Kendall trend line. Defaults to null, which means that the first year of the datetime parameter is used as the start year
mann_kendall_trend_end_year - is a string with the end year for the Mann-Kendall trend line. Defaults to null, which means that the last year of the datetime parameter is used as the last year

The API response would be a JSON object of the form:

{
  "station_id": "{station_id}",
  "mann_kendall_trend": {
    "slope": 0.6,
    "intercept": 3.4,
    "significance": 0.9
  } 
  "values": [
    {
      "value": 5.6,
      "series": "tas_mean",
      "date": "1980-05-01"
    },
    {
      "value": 5.7,
      "series": "tas_mean",
      "date": "1980-06-01",
    },
    {
      "value": 5.4,
      "series": "mann_kendall_trend",
      "date": "1980-06-01",
    },
     {
      "value": 5.4,
      "series": "decade-1981-1990",
      "date": "1981-01-01",
    }
  ]
}

Time series for model data

Chart data for model data has the following requirements:

charts are plotted for each modeled variable (modvar) (_i.e._TAS, TASMAX, etc)
each modvar may have a corresponding observations variable (obsvar) (ex: TAS corresponds to TDd)
plots may include both the modvar and the corresponding obsvar time series. They may also just include derived data from each of these series
both the modvar and the obsvar may be smoothed with a pre-processing algorithm and the algorithm may be different for each. plots may include the _smoothedmodvar and _smoothedobsvar series
a modvar may have two additional uncertainty timeseries, for upper and lower bounds (uncertmodvar). Plots may include one or both the _uncertmodvar series. In this case they will use the same smoothing algorithm as the modvar
there may be other related modeled variables (_relatedmodvars) for a single modvar. These _relatedmodvars may be included as additional series. In this case, they will use the same smoothing algorithm as the modvar. Each related mod var in _relatedmodvars may also have its own uncertainty timeseries (lower and upper) (_uncert_relatedmodvars). These related uncertainty values may be plotted. In this case, they will use the same smoothing algorithm as their respective variable.

With these requirements in mind, an API for coverage-related charts will look like this:

/api/v2/coverages/time-series/{coverage-identifier}?
  coords={wkt}&
  datetime={datetime}&
  include_coverage_data={true/false}&
  include_observations_data={true/false}&
  coverage_data_smoothing={smoothing_algorithm}&
  observations_data_smoothing={smoothing_algorithm}&
  include_coverage_uncertainty={true/false}&
  include_coverage_related_data={true/false}&

Where:

coords would be a WKT string for a point or multipoint with the location(s) to be sampled
datetime would be a timerange for specifying the temporal interval of the data. This is a string with the same semantics as those defined in OGC API EDR
include_coverage_data is a boolean specifying whether the coverage data time series is to be returned or not - defaults to true
include_observations_data is a boolean specifying whether the observations data time series is to be returned or not - defaults to true
coverage_data_smoothing is a string with the name of the smoothing algorithm to be applied to all coverage data. If null then no smoothing is applied - defaults to null
observations_data_smoothing is a string with the name of the smoothing algorithm to be applied to all observations data. If null then no smoothing is applied - defaults to null
include_coverage_uncertainty is a boolean specifying whether to include the uncertainty time series in the response. If enabled, also returns the uncertainty for the related data - defaults to true
include_coverage_related_data is a boolean specifying whether to include the related variables' time series in the response - defaults to true

The API response would be a JSON list of the form:

[
  {
    "value": 5.4
    "series": "tas_mean",
    "date": "1980-01-01",
  },
  {
    "value": 5.7
    "series": "tas_mean-uncertainty_upper_bound",
    "date": "1980-01-01",
  }
]

gmassaroarpav commented 5 months ago

The API response for observational data timeseries would be a JSON object of the form: { "station_id": "{station_id}", "mann_kendall_trend": { "slope": 0.6, "intercept": 3.4, "significance": 0.9 } "values": [ { "value": 5.6, "series": "tas_mean", "date": "1980-05-01" }, { "value": 5.7, "series": "tas_smooth", "date": "1980-06-01", }, { "value": 5.4, "series": "mann_kendall_trend", "date": "1980-06-01", }, { "value": 5.4, "series": "decade-1981-1990", "date": "1981-01-01", } ] }

The API response for model data timeseries would be a JSON list of the form: [ { "value": 5.4 "series": "tas_mean", "date": "1980-01-01", }, { "value": 5.7 "series": "tas_mean-uncertainty_upper_bound", "date": "1980-01-01", }, { "value": 5.7 "series": "tas_mean-uncertainty_lower_bound", "date": "1980-01-01", }, { "value": 5.7 "series": "tas_mean_smoothed", "date": "1980-01-01", }, { "value": 5.7 "series": "tas_mean_related_data", "date": "1980-01-01", } ]

Note: tas_mean is the ensemble mean and could include the uncertainty (tas_mean-uncertainty_upper_bound, tas_mean-uncertainty_lower_bound). tas_mean_related_data are the other 5 models related to the ensemble mean; these 5 models have not associated an uncertainty.

ricardogsilva commented 4 months ago

with the recent merge of #96 we can close this

geobeyond / Arpav-PPCV-backend

Implement API endpoints for querying data #68

Time series for observations data

Time series for model data