Deltares / xugrid

Xarray and unstructured grids
https://deltares.github.io/xugrid/
MIT License
53 stars 8 forks source link

``__eq__`` is not implemented and equals isn't great either #205

Closed JoerivanEngelen closed 5 months ago

JoerivanEngelen commented 5 months ago

By default Python checks for object identity when __eq__ is not implemented. This means that grid objects are only considered equal if they refer to the same object exactly. This results in problems if you instantiate the same grid twice, and then concatenate the result; the resulting UgridDataset then has two separate topologies with the same name. Furthermore, it's currently possible to join two different topologies in a UgridDataArray if the dimensions happen to match exactly.

Moreover, .equals now only tests if grids share the same type, so is too permissive.

Ideally, we match the behavior of xarray as much as possible. In xarray terms, the Ugrid topology is an Index. So the equals behavior should match the equals behavior of xarray Indexes. These are mostly pandas Indexes: the equals check looks like:

    def equals(self, other: Index):
        if not isinstance(other, PandasIndex):
            return False
        return self.index.equals(other.index) and self.dim == other.dim

See: https://github.com/pydata/xarray/blob/e22b47511f4188e2203c5753de4a0a36094c2e83/xarray/core/indexes.py#L794C1-L797C72

The challenge is that the Ugrid topology object is more complicated, it has:

A reasonable assumption seems:

E.g. if derived coordinates such as centroid_coordinates are slightly dissimilar, the equals check should fail -- since that's what xarray does as well.

An overly strict implementation could be very short: simply convert to xarray via .to_dataset() and use xarray to compare for equality. This may be possible, depending on options available in the xarray .equals() methods.

JoerivanEngelen commented 5 months ago

To quickly solve this now, maybe:

def __eq__(self, other) -> bool:
    if other is self:
         return True
    elif not instance(other, type(self)):
         return False
    else:
         xr_self = self.to_dataset()
         xr_other = other.to_dataset()
         return xr_self.identical(xr_other)

This is arguably too strict (because it doesn't identify optional attributes). But better to be too strict than too accepting.

JoerivanEngelen commented 5 months ago

More breadcrumbs:

Being not strict potentially creates quite some headache as well, because it essentially requires an align option. Xarray perhaps takes care of this, since we could do an outer join on the objects. However, this would then introduce a new grid object, which cannot be instantiated directly, since from_dataset only does limited initialization of optional attributes.

Huite commented 5 months ago

Some other thoughts I had: the to_dataset doesn't actually include all optional attributes. So it's only somewhat strict and actually quite a nice way of testing (the to_dataset decides what's essential info and what isn't). It's potentially only a bit strict on attributes; we could skip checking those via the standard xarray .equals() method and add some code to specifically check those ourselves separately.

Regardless, almost anything is an improvement over the current implementation...

JoerivanEngelen commented 5 months ago

Update:

@Huite and I discussed that __eq__ could potentially behave similar as the PandasIndex: returning an array of bools, but we are undecided if this behavior is warranted. More useful is a is_equivalent method which tests if differently ordered topologies are equal.

The code suggested here is implemented as .equals:

To quickly solve this now, maybe:

def __eq__(self, other) -> bool:
    if other is self:
         return True
    elif not instance(other, type(self)):
         return False
    else:
         xr_self = self.to_dataset()
         xr_other = other.to_dataset()
         return xr_self.identical(xr_other)

This is arguably too strict (because it doesn't identify optional attributes). But better to be too strict than too accepting.

Closing this issue for now. Reopen if there is a demand for __eq__.