Store forecast data per model base date

amehta-scottlogic commented 1 month ago

There are two time dimensions when talking about forecast data:

When the forecast is produced, known as base time
The range of the forecast, i.e. the steps in the CAMS request usually in hours
When the forecast is valid (effective), which is base time + range

At the moment we only store valid data (measurement_time) based on the most recent forecast base time.

E.g:

If today = 2024-05-17 the forecast production time will be 2024-05-17T00:00, and we set the pollution forecast for 2024-05-22T00:00 to be some values.

Tomorrow then the forecast production time will be 2024-05-18T00:00, we will overwrite any existing valid times we previously inserted into our database e.g. effective values for yesterdays produced forecast at 2024-05-22T00:00 will be overwritten.

We should instead create new entries in this case so we can see how the forecast changes for the same effective date for different forecast base dates

Acceptance Criteria

Model base time is included in each document within forecast_data collection. The property name should be 'forecast_base_time'
The measurement_date field should be renamed to 'forecast_valid_time'
We should also store the 'forecast_range' which is the step value from CAMS
The value of this should be the date and time the forecast model was created, which comes from the model base date in the request
The unique index on the collection should be updated to include the changed fields (using liquibase) and applied to every database after merge
'source' is added as a field with value 'cams-production' to indicate where the data came from

Test Checklist:

[x] Test Analysis & discussion with developer
[x] Write Test Cases / Charters
[x] Test Cases & Charters Reviewed
[x] Test Cases / Charters Executed
[x] Regression tests executed
[ ] Write Automation tests for regression

mwalker-scottlogic commented 1 month ago

FINDING - KeyError: 'name' when running in situ script.

Steps to reproduce:

pull main
run run_in_situ_etl.py

Expected:

Process completes and in situ database updated

Actual:

Terminal crashes out with KeyError: 'name'

mwalker-scottlogic commented 1 month ago

FINDING - KeyError: 'name' when running in situ script.

Steps to reproduce:

pull main

run run_in_situ_etl.py

Expected:

Process completes and in situ database updated

Actual:

Terminal crashes out with KeyError: 'name'

UPDATE - run_in_situ_etl.py is no longer showing this error, further regression testing required however

mwalker-scottlogic commented 1 month ago

Only regression tested forecast db as in_situ is too unstable

mnyamunda-scottlogic commented 1 month ago

Test Analysis -Basic verification that all of our added/changed keys exist in each document created. Some are basic checks of existence. Others will verify that the values can only be certain values.

-Verify uniq_forecast_idx exists as an index and has the keys, "forecast_valid_time", "forecast_base_time" , "location_type", "name" , "source".

ECMWFCode4Earth / vAirify