UXARRAY / uxarray

Xarray-styled package for reading and directly operating on unstructured grid datasets following UGRID conventions
https://uxarray.readthedocs.io/
Apache License 2.0
139 stars 30 forks source link

Remapping Overwrites Original Data #819

Closed aaronzedwick closed 2 weeks ago

aaronzedwick commented 2 weeks ago

Closes #809

Overview

Fixes the issue of IDW and NN remapping overwriting the destination datasets, so that now the destination dataset/grid/dataarray is left alone, and the assigned variable used when calling the function is the only thing written to.

Expected Usage

import uxarray as ux

grid_path = "/path/to/grid.nc"
data_path = "/path/to/data.nc"

source = ux.open_dataset(grid_path, data_path)
# Destination stays unchanged
destination = ux.open_dataset(grid_path, data_path)

# Remapping results stored in remap_to
remap_to = source['variable'].remap.inverse_distance_weighted(destination_obj=destination)

PR Checklist

General

Testing

rajeeja commented 2 weeks ago

Just to clarify: remap_to = source['variable'].remap.inverse_distance_weighted(destination_obj=destination)

Assumes that destitionation doesn't have variable and if it does?

aaronzedwick commented 2 weeks ago

Just to clarify: remap_to = source['variable'].remap.inverse_distance_weighted(destination_obj=destination)

Assumes that destitionation doesn't have variable and if it does?

It shouldn’t matter if it does or doesn’t. Destination is just what structure we are remapping the data variable to. The ‘remap_to’ holds the results of the remapping. That’s the point of this fix, to make it so if destination has the variable, it isn’t replaced by the remap and is left untouched.

rajeeja commented 2 weeks ago

Just to clarify: remap_to = source['variable'].remap.inverse_distance_weighted(destination_obj=destination) Assumes that destitionation doesn't have variable and if it does?

It shouldn’t matter if it does or doesn’t. Destination is just what structure we are remapping the data variable to. The ‘remap_to’ holds the results of the remapping. That’s the point of this fix, to make it so if destination has the variable, it isn’t replaced by the remap and is left untouched.

ok, so for the case: source['temp'] and destination['temp'] are already present and if source['temp'] is mapped to destination, what happens? where is the source['temp'] written to in the destination?

aaronzedwick commented 2 weeks ago

Nowhere in destination. It is written to the provided variable to the left of the equal sign, in this case ‘remap_to’.

rajeeja commented 2 weeks ago

Nowhere in destination. It is written to the provided variable to the left of the equal sign, in this case ‘remap_to’.

ok, remap_to doesn't have a grid associated? or does it have a copy of the destination grid ?

aaronzedwick commented 2 weeks ago

Nowhere in destination. It is written to the provided variable to the left of the equal sign, in this case ‘remap_to’.

ok, remap_to doesn't have a grid associated? or does it have a copy of the destination grid ?

It does, it has a copy.

rajeeja commented 2 weeks ago

Nowhere in destination. It is written to the provided variable to the left of the equal sign, in this case ‘remap_to’.

ok, remap_to doesn't have a grid associated? or does it have a copy of the destination grid ?

It does, it has a copy.

OK, what is the call if I don't want to create a new grid and use destination mesh object for remapping?

aaronzedwick commented 2 weeks ago

Nowhere in destination. It is written to the provided variable to the left of the equal sign, in this case ‘remap_to’.

ok, remap_to doesn't have a grid associated? or does it have a copy of the destination grid ?

It does, it has a copy.

OK, what is the call if I don't want to create a new grid and use destination mesh object for remapping?

You can’t. Philip told me we shouldn’t allow that. Although maybe I misunderstood what he was communicating. It wouldn’t be hard to add an overwrite condition though if we wanted. At the very least it shouldn’t overwrite by default.

philipc2 commented 2 weeks ago

After giving this some thought, I think the way we use the destination_obj is very awkward, unclear, and breaks out fluent interface that we have with UxDataArray and UxDataset

I think we should require a destination_grid instead, and always return a new UxDataArray. Remapping is the process of taking data from one grid and mapping it to some destination grid. Having the "destination" be a UxDataArray or UxDataset is not clear.

Something like the following would be great. destination_obj as an optional parameter, starting the deprecation process in this PR.


# source dataset
source_uxds= ux.open_dataset(grid_path, data_path)

# destination gird (the example you had opens a dataset, which isn't the most common workflow)
destination_grid = ux.open_grid(grid_path)

# remapping will always return a new `UxDataArray` 
t2m_remapped = uxds['t2m'].remap.nearest_neighbor(destination_grid=destination_grid)

# we dont support remapping an entire dataset, but it would look something like this
uxds_remapped = uxds.remap.nearest_neighbor(destination_grid=destination_grid)
aaronzedwick commented 2 weeks ago

@philipc2 How does something like this look? Once we finish the deprecation period we can remove the redundant stuff of course, but I thought this was a good way to let everything function as it should for now.

philipc2 commented 2 weeks ago

@philipc2 How does something like this look? Once we finish the deprecation period we can remove the redundant stuff of course, but I thought this was a good way to let everything function as it should for now.

This is exactly what I was thinking, it looks great!

Giving it a review now.