csiro-coasts / emsarray

xarray extension that supports EMS model formats
BSD 3-Clause "New" or "Revised" License
13 stars 2 forks source link

Allow passing either Dataset or Format instances to operations #41

Closed mx-moth closed 1 year ago

mx-moth commented 1 year ago

Operations should accept either a xarray.Dataset or a emsarray.formats.Format instance. Both of these types have enough information to access the other.

Rationale

Sometimes, the correct Format to use is difficult to autodetect, or developers want to customise the Format for a specific use case. The dataset.ems attribute does not allow developers to customise the Format being used, and does not allow assignment. Always passing a Dataset does not allow developers to use customised Format instances.

Implemntation

A Format holds a reference to its associated Dataset, and a Dataset has the dataset.ems attribute. A new utility function should be added:

from emsarray.formats import Format
from typing import Union, Tuple
import xarray as xr

DatasetOrFormat = Union[xr.Dataset, Format]

def dataset_and_format(df: DatasetOrFormat) -> Tuple[xarray.Dataset, Format]:
    if isinstance(df, xr.Dataset):
        return df, df.ems
    if isinstance(df, Format):
        return df.dataset, df
    raise TypeError(f"Unknown argument type: {type(df)!r}")

Operations would be written as:

from emsarray.utils import DatasetOrFormat, dataset_and_format

def foo_operation(df: DatasetOrFormat, ...):
    dataset, format = dataset_and_format(df)
    ...

Internally, any time an operation is called, we should pass the Format instance if one is already present (i.e. pass self instead of self.dataset from Format methods), to ensure custom Formats are respected.

mx-moth commented 1 year ago

This is no longer required. A specific Convention class can be bound to a dataset, overriding the autodetection that normally happens with dataset.ems.