OpenDrift / opendrift

Open source framework for ocean trajectory modelling
https://opendrift.github.io
GNU General Public License v2.0
231 stars 113 forks source link

IO module for parquet #1259

Closed poplarShift closed 3 months ago

poplarShift commented 3 months ago

For long-running simulations with limited particle lifetimes, in-memory sizes of netcdf arrays become quite large. Also, (at least for me) the first operation I do with the netcdf files usually is xr.open_dataset(...).to_dataframe().This is a first draft that writes directly to a tabular format.

knutfrode commented 3 months ago

Good, nice to have the first alternative writer to the default netCDF-writer. I did some modifications, with the help from @gauteh

Note that fastparquet is made an optional dependency (not included in environment.yml), and the test is skipped unless fastparquet is installed explicitly by the user. Now all tests are passing, so merging this PR.

poplarShift commented 3 months ago

That was quicker than I expected, thanks! I believe there are still some unused imports that could have been removed - will make a separate PR for that then.

knutfrode commented 3 months ago

Yes, I removed those imports as well, and simplified example a little. But did not mention this in the commit-comments. We like tests to be as fast as possible, but not too critical here as it is only run for users who chose to install fastparquet.