NOAA-ORR-ERD / gridded

A single API for accessing / working with gridded model results on multiple grid types
https://noaa-orr-erd.github.io/gridded/index.html
The Unlicense
66 stars 14 forks source link

Saving Dataset with time varying data #90

Open krober10nd opened 2 months ago

krober10nd commented 2 months ago

I have a dataset like number of timesteps by number of nodes representing free surface on an unstructured triangular mesh. I'd like to save it to a UGRID compliant file using this package. gridded is not recognizing that my free_surface variable instance occurs in time and it only creates an error upon writing. Any idea on what I'm doing incorrectly in the syntax?

Thanks,

"""
Format a NetCDF file to be UGRID compliant using the Python package `gridded`.
"""

import gridded
from gridded.grids import UGrid
import matplotlib.pyplot as plt

import xarray as xr
import numpy as np
from datetime import timedelta, datetime

fname = "r2d_HD_ETC_HISTORICAL_HISTORICAL_OBS_00139.nc"

ds = xr.open_dataset(fname)

nodes = np.c_[ds["x"].values, ds["y"].values]
tris = np.c_[ds["v1"], ds["v2"], ds["v3"]]
times = ds["time"].values
times = [datetime.utcfromtimestamp(t.astype("O") / 1e9) for t in times]

grid = UGrid(nodes=nodes, faces=tris)

grid.build_boundaries()

time_obj = gridded.time.Time(
    data=times,
)

free_surface = ds["free_surface"].values

fs_var = gridded.variable.Variable(
    name='free_surface',
    units="meters",
    time=time_obj,
    data=free_surface.T,
    grid=grid,
)

ds_new = gridded.Dataset(
    grid=grid,
    variables={"free_surface": fs_var},
)

ds_new.save("test.nc")

produces

this

Traceback (most recent call last):
  File "/mnt/c/Users/kroberts/Projects/BD/BineraBaird/Animations/format_file_v1.py", line 32, in <module>
    fs_var = gridded.variable.Variable(
  File "/mnt/c/Users/kroberts/Resources/ugrid/gridded/gridded/variable.py", line 113, in __init__
    self.data = data
  File "/mnt/c/Users/kroberts/Resources/ugrid/gridded/gridded/variable.py", line 381, in data
    raise ValueError("Data/grid shape mismatch: Data shape is {0}, "
ValueError: Data/grid shape mismatch: Data shape is (120950, 425), Grid shape is (120950,)
jay-hennen commented 2 months ago

Swap the data dimensions. 1st dimension of the data should be time.

krober10nd commented 2 months ago

Thanks for the quick response but now this with time as the first dimension

filename is: test.nc
Saving: gridded.variable.Variable(name="free_surface", time="<gridded.time.Time object at 0x7fa2e481efb0>", units="meters", data="[[0.20214844 0.20214844 0.20214844 ... 0.00585938 0.00585938 0.00683594]
 [0.19140625 0.19140625 0.19140625 ... 0.07128906 0.07421875 0.07519531]
 [0.19433594 0.19433594 0.19433594 ... 0.14746094 0.14941406 0.15039062]
 ...
 [0.21972656 0.21972656 0.21972656 ... 0.01367188 0.01367188 0.01367188]
 [0.23925781 0.23925781 0.23925781 ... 0.04589844 0.046875   0.046875  ]
 [0.25878906 0.25878906 0.25878906 ... 0.08886719 0.08984375 0.08984375]]", )
name is: free_surface
var data is: [[0.20214844 0.20214844 0.20214844 ... 0.00585938 0.00585938 0.00683594]
 [0.19140625 0.19140625 0.19140625 ... 0.07128906 0.07421875 0.07519531]
 [0.19433594 0.19433594 0.19433594 ... 0.14746094 0.14941406 0.15039062]
 ...
 [0.21972656 0.21972656 0.21972656 ... 0.01367188 0.01367188 0.01367188]
 [0.23925781 0.23925781 0.23925781 ... 0.04589844 0.046875   0.046875  ]
 [0.25878906 0.25878906 0.25878906 ... 0.08886719 0.08984375 0.08984375]]
var data shape is: (425, 120950)
new dat var shape: ('mesh_num_node',)
Traceback (most recent call last):
  File "src/netCDF4/_netCDF4.pyx", line 5503, in netCDF4._netCDF4.Variable.__setitem__
ValueError: cannot reshape array of size 51403750 into shape (120950,)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/c/Users/kroberts/Projects/BD/BineraBaird/Animations/format_file_v1.py", line 49, in <module>
    ds_new.save("test.nc")
  File "/mnt/c/Users/kroberts/Resources/ugrid/gridded/gridded/gridded.py", line 171, in save
    self.grid.save(ncds, format='netcdf4', variables=self.variables)
  File "/mnt/c/Users/kroberts/Resources/ugrid/gridded/gridded/pyugrid/ugrid.py", line 1161, in save
    self._save_variables(nclocal, variables)
  File "/mnt/c/Users/kroberts/Resources/ugrid/gridded/gridded/pyugrid/ugrid.py", line 1209, in _save_variables
    data_var[:] = var.data[:]
  File "src/netCDF4/_netCDF4.pyx", line 5505, in netCDF4._netCDF4.Variable.__setitem__
  File "/mnt/c/Users/kroberts/Resources/base/lib/python3.10/site-packages/numpy/lib/_stride_tricks_impl.py", line 422, in broadcast_to
    return _broadcast_to(array, shape, subok=subok, readonly=True)
  File "/mnt/c/Users/kroberts/Resources/base/lib/python3.10/site-packages/numpy/lib/_stride_tricks_impl.py", line 358, in _broadcast_to
    it = np.nditer(
ValueError: input operand has more dimensions than allowed by the axis remapping
krober10nd commented 2 months ago

Shouldn't the time dimension be added on line 1173 of ugrid.py?

image

ChrisBarker-NOAA commented 2 months ago

Likely yes.

The core problem is that the saving code isn't well tested, and may not be complete :-(

It was always the goal to gridded in this way, but our operational needs haven't required it, so it hasn't gotten much attention.

PRs more than welcome!

krober10nd commented 2 months ago

Sure I can help out but what’s the UGRID convention for the treatment of time variables?

Is it mesh_name + number_of_timesteps?

ChrisBarker-NOAA commented 2 months ago

UGRID doesn't do anything special with time -- we should just follow the CF conventions.

krober10nd commented 2 months ago

Can you please give me push access so I can upload my branch here? Or do you want me to work on a fork? Thanks

ChrisBarker-NOAA commented 2 months ago

Use a fork first now, if you make more contributions, we can add you.

Thanks!