abkfenris / xarray_fmrc

A way to manage forecast Xarray datasets using datatrees
MIT License
14 stars 2 forks source link

How to open/format a sample forecast datasets? #17

Open observingClouds opened 4 months ago

observingClouds commented 4 months ago

Hi @abkfenris,

I found your package following the pangeo discussion about forecast formats and wanted to give this a try. The structure and provided functions seem very appealing.

I know this package is non-fully developed yet, but I was wondering if you could give me a quick hint on how to format datasets to match the xarray_fmrc format.

Let's assume the following:

import numpy as np
import datetime
import xarray as xr
import xarray_fmrc

ds0 = xr.Dataset(
    {
        'pres': (['forecast_reference_time', 'time', 'lat', 'lon'], np.random.randint(980, 1000, (1, 5, 10, 10)))
    },
    coords={
        'lat': np.arange(10, 20),
        'lon': np.arange(-60, -50),
        'forecast_reference_time': [datetime.datetime(2020, 1, 1, 0, 0)],
        'forecast_offset': xr.DataArray([datetime.timedelta(hours=h) for h in range(5)], dims='time'),
        'time': [
            datetime.datetime(2020, 1, 1, 0, 0),
            datetime.datetime(2020, 1, 1, 1, 0),
            datetime.datetime(2020, 1, 1, 2, 0),
            datetime.datetime(2020, 1, 1, 3, 0),
            datetime.datetime(2020, 1, 1, 4, 0)
        ]
    }
)

ds1 = xr.Dataset(
    {
        'pres': (['forecast_reference_time', 'time', 'lat', 'lon'], np.random.randint(980, 1000, (1, 5, 10, 10)))
    },
    coords={
        'lat': np.arange(10, 20),
        'lon': np.arange(-60, -50),
        'forecast_reference_time': [datetime.datetime(2020, 1, 1, 12, 0)],
        'forecast_offset': xr.DataArray([datetime.timedelta(hours=h) for h in range(5)], dims='time'),
        'time': [
            datetime.datetime(2020, 1, 1, 12, 0),
            datetime.datetime(2020, 1, 1, 13, 0),
            datetime.datetime(2020, 1, 1, 14, 0),
            datetime.datetime(2020, 1, 1, 15, 0),
            datetime.datetime(2020, 1, 1, 16, 0)
        ]
    }
)

dt = xarray_fmrc.from_dict({datetime.datetime(2020, 1, 1, 0, 0):ds0, datetime.datetime(2020, 1, 1, 12, 0):ds1})

Applying all functions provided by xarray_fmrc, e.g. dt.fmrc.constant_offset('1h') result in the same error:

ValueError: those coordinates do not have an index: {'forecast_offset'}

What am I doing wrong?

Thank you very much for your help!

github-actions[bot] commented 4 months ago

Hello @observingClouds, thank you for your interest in our work!

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

abkfenris commented 4 months ago

You're right, I haven't gotten to do much with this recently so it is still rough, but I still want to work on it, so I'm happy to have you kick the tires.

From a quick look, I think the format is right, but in some cases xarray doesn't automatically create an index for a coordinate. I'm not exactly sure why it doesn't always do it, but how about trying giving it a nudge to create the index for 'forecast_offset' on both ds0 and ds1?

observingClouds commented 4 months ago

Thanks @abkfenris for your quick response.

Registering forecast_offset explicitly as an index leads to a working dt.fmrc.constant_offset() and dt.fmrc.best(). dt.fmrc.constant_forecast continues to fail with ValueError: those coordinates do not have an index: {'forecast_offset'}.

The issue seems to be that to_dict is not registering any coordinates on the node level, like from_model_runs did. Installing the previous version (969afbf2fbe64cf6d0187994bd2d4f1c7eb4442f) and using dt = xarray_fmrc.from_model_runs([ds0,ds1]) works for all accessor functions.

Was there a reason to remove from_model_runs?

I'll might play around with this a bit more.

Just for completeness here are the extra lines of code to set the index of the datasets (which are not needed when using from_model_runs

ds0 = ds0.set_xindex(coord_names=['forecast_offset'])
ds1 = ds1.set_xindex(coord_names=['forecast_offset'])
abkfenris commented 4 months ago

I'm glad to hear that .from_model_runs worked

I made that switch to try to generalize things more, hopefully so that the library could evolve and support various datatree structures with different ways of looking up and matching to how folks were already structuring their data. Clearly it isn't quite there yet...

I probably should have an option to do some sort of validation using .to_dict(), or have a separate validation function.

FYI, I'm about to be largely away from internet access for the next two weeks, so my responses might be a little bit more delayed.