Notes

DataArray is returned if reading a single timeseries output variable, e.g., plan_hdf.mesh_timeseries_output("MeshName", "Water Surface")
- Dataset is returned if reading out multiple timeseries output variables, e.g., plan_hdf.mesh_timeseries_output_cells("MeshName"), plan_hdf.mesh_timeseries_output_faces("MeshName")
- Note that output variables based on mesh cells are incompatible with outputs based on mesh faces, since they are based on different array indices
If dask is installed, DataArray/Dataset objects are based on Dask arrays. If not, default to numpy arrays.
- When an array is read from the HDF, rashdf will set the dask array chunk size to match the HDF chunk size. I don't know if it's optimal to let the user set the dask array chunk size upfront.
I looked into Xvec a little bit to see if we could export geometry-aware Datasets / DataArrays. Seems possible, but with only two releases Xvec doesn't seem very mature yet. Importantly, doesn't seem like Xvec Datasets / DataArrays could be easily saved to disk.
- Relevant issues with lengthy discussion:
- https://github.com/xarray-contrib/xvec/issues/26
- https://github.com/xarray-contrib/xvec/issues/48

Examples

Read cell water surface output as DataArray

>>> from rashdf import RasPlanHdf, TimeSeriesOutputVar
>>> plan_hdf.mesh_area_names()
['BaldEagleCr', 'Upper 2D Area']
>>> # read out a DataArray for a single timeseries output variable
>>> ws = plan_hdf.mesh_timeseries_output("BaldEagleCr", TimeSeriesOutputVar.WATER_SURFACE)
>>> ws  # note below that this is a dask-based DataArray, since dask was installed
<xarray.DataArray 'BaldEagleCr - Water Surface' (time: 37, cell_id: 3359)> Size: 497kB
dask.array<getitem, shape=(37, 3359), dtype=float32, chunksize=(12, 3359), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 296B 1999-01-01T12:00:00 ... 1999-01-04T12...
  * cell_id  (cell_id) int64 27kB 0 1 2 3 4 5 ... 3353 3354 3355 3356 3357 3358
Attributes:
    mesh_name:  BaldEagleCr
    variable:   Water Surface
    units:      ft
>>> # select the output for a single cell
>>> ws.sel(cell_id=123)
<xarray.DataArray 'BaldEagleCr - Water Surface' (time: 37)> Size: 148B
dask.array<getitem, shape=(37,), dtype=float32, chunksize=(12,), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 296B 1999-01-01T12:00:00 ... 1999-01-04T12...
    cell_id  int64 8B 123
Attributes:
    mesh_name:  BaldEagleCr
    variable:   Water Surface
    units:      ft
>>> # return the values for a single cell
>>> ws.sel(cell_id=123).values
array([537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 ,
       537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 ,
       537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 , 537.2959 ,
       537.2959 , 537.2959 , 537.2959 , 537.39996, 543.78345, 546.0193 ,
       547.2876 , 548.71246, 549.9208 , 550.59985, 550.90826, 550.9861 ,
       550.92456, 550.7785 , 550.5764 , 550.3424 , 550.0916 , 549.85077,
       549.6207 ], dtype=float32)
>>> # check out the time dimension
>>> ws.time
<xarray.DataArray 'time' (time: 37)> Size: 296B
array(['1999-01-01T12:00:00.000000000', '1999-01-01T14:00:00.000000000',
       '1999-01-01T16:00:00.000000000', '1999-01-01T18:00:00.000000000',
       '1999-01-01T20:00:00.000000000', '1999-01-01T22:00:00.000000000',
       '1999-01-02T00:00:00.000000000', '1999-01-02T02:00:00.000000000',
       '1999-01-02T04:00:00.000000000', '1999-01-02T06:00:00.000000000',
       '1999-01-02T08:00:00.000000000', '1999-01-02T10:00:00.000000000',
       '1999-01-02T12:00:00.000000000', '1999-01-02T14:00:00.000000000',
       '1999-01-02T16:00:00.000000000', '1999-01-02T18:00:00.000000000',
       '1999-01-02T20:00:00.000000000', '1999-01-02T22:00:00.000000000',
       '1999-01-03T00:00:00.000000000', '1999-01-03T02:00:00.000000000',
       '1999-01-03T04:00:00.000000000', '1999-01-03T06:00:00.000000000',
       '1999-01-03T08:00:00.000000000', '1999-01-03T10:00:00.000000000',
       '1999-01-03T12:00:00.000000000', '1999-01-03T14:00:00.000000000',
       '1999-01-03T16:00:00.000000000', '1999-01-03T18:00:00.000000000',
       '1999-01-03T20:00:00.000000000', '1999-01-03T22:00:00.000000000',
       '1999-01-04T00:00:00.000000000', '1999-01-04T02:00:00.000000000',
       '1999-01-04T04:00:00.000000000', '1999-01-04T06:00:00.000000000',
       '1999-01-04T08:00:00.000000000', '1999-01-04T10:00:00.000000000',
       '1999-01-04T12:00:00.000000000'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 296B 1999-01-01T12:00:00 ... 1999-01-04T12...

Read all available cell-based timeseries output as DataArray

>>> # Read all cell timeseries outputs as Dataset
>>> plan_hdf = RasPlanHdf.open_uri("s3://kanawha-pilot/FFRD_Kanawha_Compute/runs/13/ras/ElkMiddle/ElkMiddle.p01.hdf")
>>> ds = plan_hdf.mesh_timeseries_output_cells("ElkMiddle")
>>> ds
<xarray.Dataset> Size: 66MB
Dimensions:                              (time: 577, cell_id: 14188)
Coordinates:
  * time                                 (time) datetime64[ns] 5kB 1996-01-14...
  * cell_id                              (cell_id) int64 114kB 0 1 ... 14187
Data variables:
    Water Surface                        (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
    Cell Cumulative Precipitation Depth  (time, cell_id) float32 33MB dask.array<chunksize=(3, 14188), meta=np.ndarray>
Attributes:
    mesh_name:  ElkMiddle
>>> # get a DataArray for a single output variable
>>> ws = ds["Water Surface"]
>>> ws
<xarray.DataArray 'Water Surface' (time: 577, cell_id: 14188)> Size: 33MB
dask.array<getitem, shape=(577, 14188), dtype=float32, chunksize=(3, 14188), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 5kB 1996-01-14T12:00:00 ... 1996-02-07T12:...
  * cell_id  (cell_id) int64 114kB 0 1 2 3 4 5 ... 14183 14184 14185 14186 14187
Attributes:
    mesh_name:  ElkMiddle
    variable:   Water Surface
    units:      ft
>>> # select cells from the DataArray
>>> ws.sel(cell_id=slice(100, 105))
<xarray.DataArray 'Water Surface' (time: 577, cell_id: 6)> Size: 14kB
dask.array<getitem, shape=(577, 6), dtype=float32, chunksize=(3, 6), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 5kB 1996-01-14T12:00:00 ... 1996-02-07T12:...
  * cell_id  (cell_id) int64 48B 100 101 102 103 104 105
Attributes:
    mesh_name:  ElkMiddle
    variable:   Water Surface
    units:      ft

Read all available mesh face timeseries outputs as Dataset

>>> plan_hdf = RasPlanHdf("/mnt/c/temp/Example_Projects_6_5/Example_Projects/2D Unsteady Flow Hydraulics/BaldEagleCrkMulti2D/BaldEagleDamBrk.p18.hdf")
>>> plan_hdf.mesh_timeseries_output_faces("Upper 2D Area")
<xarray.Dataset> Size: 4MB
Dimensions:                      (time: 37, face_id: 2286)
Coordinates:
  * time                         (time) datetime64[ns] 296B 1999-01-01T12:00:...
  * face_id                      (face_id) int64 18kB 0 1 2 3 ... 2283 2284 2285
Data variables:
    Face Courant                 (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Cumulative Volume       (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Eddy Viscosity          (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Flow                    (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Flow Period Average     (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Friction Term           (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Pressure Gradient Term  (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Shear Stress            (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Tangential Velocity     (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Velocity                (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
    Face Water Surface           (time, face_id) float32 338kB dask.array<chunksize=(21, 2286), meta=np.ndarray>
Attributes:
    mesh_name:  Upper 2D Area

zherbz commented 5 months ago

Just one comment on the mentioned topic of saving DataArrays to disc. I recently ran into this issue dealing with large raster datasets and was able to get by using rioxarray: https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray.raster_dataset.RasterDataset.to_raster.

Using the windowed parameter of the .to_raster() method writes the xarray DataArray to disc in the same size as it was chunked when first being read.

thwllms commented 5 months ago

Just one comment on the mentioned topic of saving DataArrays to disc. I recently ran into this issue dealing with large raster datasets and was able to get by using rioxarray: https://corteva.github.io/rioxarray/html/rioxarray.html#rioxarray.raster_dataset.RasterDataset.to_raster.

Using the windowed parameter of the .to_raster() method writes the xarray DataArray to disc in the same size as it was chunked when first being read.

Gotcha. I've used rioxarray as well. Since this isn't raster data I think we have to write out to Zarr or another format. I hope that Xvec continues to be developed since their data model seems like it would be ideal for our use case (i.e., vector GIS data with multidimensional values)

codecov[bot] commented 5 months ago

Codecov Report

Attention: Patch coverage is 97.82609% with 2 lines in your changes missing coverage. Please review.

Files	Coverage Δ
src/rashdf/utils.py	`90.62% <100.00%> (+0.51%)`	:arrow_up:
src/rashdf/plan.py	`98.85% <97.70%> (-0.59%)`	:arrow_down:

fema-ffrd / rashdf

Mesh Timeseries Output #53

44

45