UXARRAY / uxarray

Xarray extension for unstructured climate and global weather data analysis and visualization.
https://uxarray.readthedocs.io/
Apache License 2.0
148 stars 31 forks source link

Dual Mesh Construction #859

Open aaronzedwick opened 1 month ago

aaronzedwick commented 1 month ago

Closes #825

Overview

Constructs the dual mesh of a grid using face_centers and node_face_connectivity

Expected Usage

import uxarray as ux

grid_path = "/path/to/grid.nc"

grid = ux.open_grid(grid_path)

dual = grid.compute_dual()

PR Checklist

General

Testing

Documentation

aaronzedwick commented 1 month ago

@philipc2 Thoughts on the API? Currently, I return a dual grid object as its own thing. This made sense to me, as changing the original grid would be difficult due to the different dimensions.

philipc2 commented 1 month ago

@philipc2 Thoughts on the API? Currently, I return a dual grid object as its own thing. This made sense to me, as changing the original grid would be difficult due to the different dimensions.

This type of implementation is what I was envisioning, since we don't want anything to be replaced in-place when creating the dual grid.

Let's name it something like Grid.compute_dual(method=...), where we method could indicate what type of dual we'd like to compute.

philipc2 commented 1 month ago

We should add functionality that can also convert a UxDataset or UxDataArray into it's "dual" form, which would involve making any face-centered data variable now node-centered, and vice-versa. This would however cause issues if the data variable is edge centered, since those edges would no longer exist.

philipc2 commented 1 month ago

This was the block that we were able to get working in the Notebook.

import uxarray as ux
from uxarray.constants import INT_DTYPE, INT_FILL_VALUE
import numpy as np

uxgrid = ux.open_grid("/Users/philipc/uxarray-gold/uxarray-gold/uxarray/test/meshfiles/geos-cs/c12/test-c12.native.nc4")

lonlat_t = [(lon, lat) for lon, lat in zip(uxgrid.node_lon.values, uxgrid.node_lat.values)]

# # Dictionary to track first occurrence and subsequent indices
occurrences = {}

# Iterate through the list and track occurrences
for index, tpl in enumerate(lonlat_t):
    if tpl in occurrences:
        occurrences[tpl].append((INT_DTYPE(index)))
    else:
        occurrences[tpl] = [INT_DTYPE(index)]
duplicate_dict = {}

for tpl, indices in occurrences.items():
    if len(indices) > 1:
        source_idx = indices[0]
        for duplicate_idx in indices[1:]:
            duplicate_dict[duplicate_idx] = source_idx

new_face_node_connectivity = uxgrid.face_node_connectivity.values.copy().ravel()

for idx, item in enumerate(new_face_node_connectivity):
    # O(1)
    if item in duplicate_dict:
        new_face_node_connectivity[idx] = duplicate_dict[item]

new_face_node_connectivity = new_face_node_connectivity.reshape((uxgrid.n_face, uxgrid.n_max_face_nodes))

node_face_conn = {node_i: [] for node_i in range(uxgrid.n_node)}
for face_i, face_nodes in enumerate(new_face_node_connectivity):
    for node_i in face_nodes:
        if node_i != ux.INT_FILL_VALUE:
            node_face_conn[node_i].append(face_i)

n_max_node_faces = -1   
for face_indicies in node_face_conn.values():
    if len(face_indicies) > n_max_node_faces:
        n_max_node_faces = len(face_indicies)

node_face_connectivity = np.full((uxgrid.n_node, n_max_node_faces), INT_FILL_VALUE)

for node_idx, face_indices in enumerate(node_face_conn.values()):
    n_faces = len(face_indices)
    node_face_connectivity[node_idx, 0:n_faces] = face_indices

new_uxgrid = ux.Grid.from_topology(uxgrid.node_lon.values, 
                                   uxgrid.node_lat.values, 
                                   new_face_node_connectivity,
                                   node_face_connectivity=node_face_connectivity)
aaronzedwick commented 1 month ago

Thanks @philipc2! I will see if I can help figure out what's going on.

aaronzedwick commented 1 month ago

It is definitely something with how you are updating and returning the grid, I did a very basic implementation just as a function inside the grid class, called by compute dual at the start, and it works perfectly. Red is the original grid. image

rajeeja commented 1 month ago

It is definitely something with how you are updating and returning the grid, I did a very basic implementation just as a function inside the grid class, called by compute dual at the start, and it works perfectly. Red is the original grid.

Right, it'd be better to compare the grid objects returned in each case, specifically the merged nodes part in both cases (notebook and installed versions)

review-notebook-app[bot] commented 1 month ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

philipc2 commented 1 month ago

It is definitely something with how you are updating and returning the grid, I did a very basic implementation just as a function inside the grid class, called by compute dual at the start, and it works perfectly. Red is the original grid. image

Yeah, I'm almost wondering if there might be a bug somewhere in our Grid.from_topology(). Going to do some more digging.

aaronzedwick commented 1 month ago

It is definitely something with how you are updating and returning the grid, I did a very basic implementation just as a function inside the grid class, called by compute dual at the start, and it works perfectly. Red is the original grid. image

Yeah, I'm almost wondering if there might be a bug somewhere in our Grid.from_topology(). Going to do some more digging.

It's working now though? I think it was just a problem with pip install . I have been having trouble getting it to work. That's what the notebook is for, just testing.

rajeeja commented 1 month ago

This can come out of draft now?

aaronzedwick commented 1 month ago

@philipc2 Are you fine with me splitting this into two PRs and just handling the dual mesh here? That way we can just focus on that and get it merged instead of worrying about why the duplication stuff isn't working?

philipc2 commented 1 month ago

@philipc2 Are you fine with me splitting this into two PRs and just handling the dual mesh here? That way we can just focus on that and get it merged instead of worrying about why the duplication stuff isn't working?

Hi @aaronzedwick

I had a local branch with the changes split, let me get that pushed and set up as a PR.

aaronzedwick commented 1 month ago

@philipc2 Are you fine with me splitting this into two PRs and just handling the dual mesh here? That way we can just focus on that and get it merged instead of worrying about why the duplication stuff isn't working?

Hi @aaronzedwick

I had a local branch with the changes split, let me get that pushed and set up as a PR.

Awesome, thank you Philip!

github-actions[bot] commented 1 month ago

ASV Benchmarking

Benchmark Comparison Results Benchmarks that have improved: | Change | Before [82e46b8c] | After [c5685740] | Ratio | Benchmark (Parameter) | |----------|----------------------|---------------------|---------|--------------------------------------------------------------------| | - | 2.06±0.03ms | 1.81±0.01ms | 0.88 | mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km') | | - | 1.36±0.02μs | 1.21±0μs | 0.89 | mpas_ocean.ConstructTreeStructures.time_ball_tree('120km') | | | failed | 125±1ms | n/a | mpas_ocean.DualMesh.time_dual_mesh_construction('120km') | | | failed | 8.92±0.4ms | n/a | mpas_ocean.DualMesh.time_dual_mesh_construction('480km') | | - | 407M | 354M | 0.87 | mpas_ocean.Integrate.peakmem_integrate('480km') | Benchmarks that have stayed the same: | Change | Before [82e46b8c] | After [c5685740] | Ratio | Benchmark (Parameter) | |----------|----------------------|---------------------|---------|---------------------------------------------------------------------------------------------------------------------------------------| | | 379M | 380M | 1 | face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc')) | | | 380M | 381M | 1 | face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc')) | | | 383M | 384M | 1 | face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc')) | | | 382M | 382M | 1 | face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc')) | | | 1.33±0s | 1.39±0.01s | 1.05 | face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc')) | | | 179±1ms | 184±1ms | 1.03 | face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc')) | | | 1.63±0s | 1.66±0s | 1.02 | face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc')) | | | 7.86±0.1ms | 7.92±0.2ms | 1.01 | face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc')) | | | 1.76±0.02s | 1.74±0.03s | 0.99 | import.Imports.timeraw_import_uxarray | | | 655±10ms | 651±20ms | 0.99 | mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km') | | | 40.8±0.3ms | 40.6±1ms | 1 | mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km') | | | 472±3μs | 491±5μs | 1.04 | mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km') | | | 304±4ns | 304±4ns | 1 | mpas_ocean.ConstructTreeStructures.time_ball_tree('480km') | | | 790±1ns | 818±8ns | 1.04 | mpas_ocean.ConstructTreeStructures.time_kd_tree('120km') | | | 283±2ns | 293±4ns | 1.04 | mpas_ocean.ConstructTreeStructures.time_kd_tree('480km') | | | 398M | 402M | 1.01 | mpas_ocean.GeoDataFrame.peakmem_to_geodataframe('120km', False) | | | 388M | 390M | 1.01 | mpas_ocean.GeoDataFrame.peakmem_to_geodataframe('120km', True) | | | 362M | 362M | 1 | mpas_ocean.GeoDataFrame.peakmem_to_geodataframe('480km', False) | | | 361M | 361M | 1 | mpas_ocean.GeoDataFrame.peakmem_to_geodataframe('480km', True) | | | 1.03±0s | 1.06±0s | 1.02 | mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False) | | | 57.9±2ms | 59.7±0.7ms | 1.03 | mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True) | | | 79.0±0.4ms | 78.9±0.6ms | 1 | mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False) | | | 5.59±0.1ms | 5.59±0.2ms | 1 | mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True) | | | 263M | 263M | 1 | mpas_ocean.Gradient.peakmem_gradient('120km') | | | 240M | 240M | 1 | mpas_ocean.Gradient.peakmem_gradient('480km') | | | 2.79±0.06ms | 2.71±0.02ms | 0.97 | mpas_ocean.Gradient.time_gradient('120km') | | | 285±1μs | 289±4μs | 1.01 | mpas_ocean.Gradient.time_gradient('480km') | | | 370M | 370M | 1 | mpas_ocean.Integrate.peakmem_integrate('120km') | | | 175±0.9ms | 179±2ms | 1.02 | mpas_ocean.Integrate.time_integrate('120km') | | | 12.2±0.04ms | 11.9±0.06ms | 0.98 | mpas_ocean.Integrate.time_integrate('480km') | | | 347±3ms | 343±0.5ms | 0.99 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude') | | | 350±2ms | 345±2ms | 0.99 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include') | | | 348±3ms | 345±3ms | 0.99 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split') | | | 23.0±0.4ms | 23.3±0.3ms | 1.01 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude') | | | 22.1±0.3ms | 23.1±0.7ms | 1.05 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include') | | | 22.4±0.6ms | 22.3±0.2ms | 0.99 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split') | | | 56.7±1ms | 54.9±0.08ms | 0.97 | mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping | | | 45.7±0.9ms | 44.4±0.07ms | 0.97 | mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping | | | 373±5ms | 360±0.6ms | 0.97 | mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping | | | 273±2ms | 263±0.5ms | 0.97 | mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping | | | 236M | 236M | 1 | quad_hexagon.QuadHexagon.peakmem_open_dataset | | | 236M | 236M | 1 | quad_hexagon.QuadHexagon.peakmem_open_grid | | | 6.46±0.03ms | 6.50±0.02ms | 1.01 | quad_hexagon.QuadHexagon.time_open_dataset | | | 5.51±0.04ms | 5.59±0.07ms | 1.01 | quad_hexagon.QuadHexagon.time_open_grid |
aaronzedwick commented 1 month ago

@philipc2 This should be ready for review, I made a lot of changes and got it down from 8.69s to 127ms for the 120km benchmark.

philipc2 commented 3 weeks ago

@philipc2 This should be ready for review, I made a lot of changes and got it down from 8.69s to 127ms for the 120km benchmark.

That's a significant improvement, nicely done!

I'm checking the compatibility with #879 and we can see where to move from there.

philipc2 commented 3 weeks ago

Continuing one of my comments above, we should create a UxDataArray.get_dual() method that keeps the data stored under the Data Array, but swaps the mapping such that:

aaronzedwick commented 3 weeks ago

Continuing one of my comments above, we should create a UxDataArray.get_dual() method that keeps the data stored under the Data Array, but swaps the mapping such that:

  • Face Centered variables become Node Centered
  • Node Centered variables become Face Centered
  • An error is raised for edge centered variables

Data array or dataset? We load in datasets more often than anything else, right?

aaronzedwick commented 3 weeks ago

Also, something with the latest commits to main messed up my tests.

aaronzedwick commented 2 weeks ago

Continuing one of my comments above, we should create a UxDataArray.get_dual() method that keeps the data stored under the Data Array, but swaps the mapping such that:

  • Face Centered variables become Node Centered
  • Node Centered variables become Face Centered
  • An error is raised for edge centered variables

Data array or dataset? We load in datasets more often than anything else, right?

@philipc2

philipc2 commented 2 weeks ago

Continuing one of my comments above, we should create a UxDataArray.get_dual() method that keeps the data stored under the Data Array, but swaps the mapping such that:

  • Face Centered variables become Node Centered
  • Node Centered variables become Face Centered
  • An error is raised for edge centered variables

Data array or dataset? We load in datasets more often than anything else, right?

I would suggest making a DataArray implementation (i.e. for a single data variable), and then also create a Dataset version. The user should be able to call either.

I believe edge-centered data should actually remain the same mapping in the both the dual & primal grids, since the new edge will intersect the original one.

image

aaronzedwick commented 2 weeks ago

Continuing one of my comments above, we should create a UxDataArray.get_dual() method that keeps the data stored under the Data Array, but swaps the mapping such that:

  • Face Centered variables become Node Centered
  • Node Centered variables become Face Centered
  • An error is raised for edge centered variables

Data array or dataset? We load in datasets more often than anything else, right?

I would suggest making a DataArray implementation (i.e. for a single data variable), and then also create a Dataset version. The user should be able to call either.

I believe edge-centered data should actually remain the same mapping in the both the dual & primal grids, since the new edge will intersect the original one.

image

Not if the mesh has holes, then there will be less n_edges.

aaronzedwick commented 2 days ago

@philipc2 This is ready for review, especially the notebook I put together would be great to have some feedback on.

philipc2 commented 1 day ago

Left an initial notebook review. Will give it a deeper look through tomorrow.