ECMWFCode4Earth / vAirify

code repository for 2024 Code for Earth project #16
MIT License
1 stars 0 forks source link

As a tester I have an automated integration test suite for the CAMs ETL pipeline so that system quality does not regress #55

Closed rstrange-scottlogic closed 2 weeks ago

rstrange-scottlogic commented 1 month ago

Acceptance Criteria:

mnyamunda-scottlogic commented 1 month ago

Basic process

mnyamunda-scottlogic commented 4 weeks ago

In terms of regression we target specific documents and assert the values shown in the database. Failure of these tests would simple be an alert that the code has changed in terms of the etl process.

Plans to use some xarray functionality to grab a raw value from our GRIB. Then to manually go through all the calculations and store it as a variable. Then we assert that variable matches whatever we have within our database. As opposed to asserting a value such as 10.321472471264862.

mnyamunda-scottlogic commented 3 weeks ago

Extra modifications to run_forecast_etl to allow overriding of base date and time from default value.

As testers we have used xarray ourselves to open a known grib file and manually do the calculations required to get to the final value stored in the DB. By getting an exact match for single level data and a very near match with multi_level data we can confidently assert exact values that are coming from the database.

example code

def epic_interpolation():
    file_path = "single_level_2024-06-04_00.grib"
    ds = xr.open_dataset(file_path, engine="cfgrib")

    latitudes = ds['latitude'].values
    longitudes = ds['longitude'].values
    pm2_5_data = ds['pm2p5'].isel(step=0).values

    target_lat = 25.0657
    target_lon = 55.17128

    if target_lon < 0:
        target_lon += 360

    interpolator = scipy.interpolate.interp2d(longitudes, latitudes, pm2_5_data, kind='linear')
    pm2_5_value = interpolator(target_lon, target_lat)

    pprint.pprint(pm2_5_value[0] * 10 ** 9)