Cloud-Drift / clouddrift

CloudDrift accelerates the use of Lagrangian data for atmospheric, oceanic, and climate sciences.
https://clouddrift.org/
MIT License
36 stars 8 forks source link

RaggedArray from numerical #40

Open selipot opened 1 year ago

selipot commented 1 year ago

I am following the example dataformat-numerical.ipynb to convert the output of an ocean parcels simulation to a ragged array and save to a NetCDF file but I do not understand how the time variable is handled and/or if the units can be specified. The NetCDF file written by parcels contain the variable time in units of seconds since a pivot date but the NetCDF file written by clouddrift after converting to a ragged array seems to be in minutes since the origin of the experiment. I dug through dataformat.py to understand but could not figure it out.

milancurcic commented 1 year ago

I'll play with it and let you know what I find.

selipot commented 1 year ago

The latest version of ocean parcels now outputs in zarr format, see https://github.com/OceanParcels/parcels/releases/tag/v2.4.0. It is a priority to write a new recipe that takes such zarr output (still written as a sparse 2D array) into a RaggedArray. We also should add a functionality to output the RaggedArray to zarr with RaggedArray.to_zarr()

philippemiron commented 1 year ago

I am following the example dataformat-numerical.ipynb to convert the output of an ocean parcels simulation to a ragged array and save to a NetCDF file but I do not understand how the time variable is handled and/or if the units can be specified. The NetCDF file written by parcels contain the variable time in units of seconds since a pivot date but the NetCDF file written by clouddrift after converting to a ragged array seems to be in minutes since the origin of the experiment. I dug through dataformat.py to understand but could not figure it out.

When you open the netCDF with decode_times=False, you get the array of "offsets" directly. In that example, I then set the time attributes as: 'long_name': 'Time in days', 'units': 'days since 2021-01-01'. The 'units' is recognized later on by the NetCDF library to convert back the time if needed.

For the data used in the example Notebook: Screen Shot 2022-10-24 at 21 42 49

But as I said in the top of the Notebook, the format is very close to the output format of Ocean Parcels and OpenDrift, so with a Parcels file the origin might be different. I don't remember now if by default it is a constant origin or it is set to the start of the experiment.