asascience-open / xarray-subset-grid

Subset Xarray datasets in space
BSD 3-Clause "New" or "Revised" License
5 stars 2 forks source link

Add benchmark examples #14

Open mpiannucci opened 1 month ago

mpiannucci commented 1 month ago

There are a lot of alternate ways to do the things we are doing with this package. We should document performance and validation with integration test cases.

This will point out performance flaws and places where the package needs to improve.

omkar-334 commented 3 weeks ago

The alternate ways so far are thalassa, xugrid and netCDF4. Are there any more that we can test against?

ChrisBarker-NOAA commented 3 weeks ago

netCDF4 doesn't have anything built-in (and is used by xarray for netcdf fiels for the most part anyway).

Is this xugrid: https://github.com/Deltares/xugrid ?

thlassa: https://github.com/ec-jrc/thalassa

And there's: UXarray: https://github.com/UXARRAY/uxarray

@mpiannucci: you looked at these when this all started -- do you have any notes as to why you decided not to build on one of them?

omkar-334 commented 3 weeks ago

thalassa has a crop method which is used for subsetting - https://github.com/ec-jrc/Thalassa/blob/master/thalassa/utils.py

ChrisBarker-NOAA commented 3 weeks ago

Thanks -- at a quick glance that looks similar to what Matt's put in this package -- but the question is what surrounds all that? how do the variables associated with the mesh get handled? can you save out a new dataset that's all complete and correct?

The answer may be yes to all of those -- which is what this issue is about.

But looking at that code, it looks like one more example of code written for a specific end-goal, and maybe not too extendable or adaptable to other uses -- I'm hoping that we can make a clean library here.

Also -- it looks like it can crop to a bounding box - not a polygon, which is less useful, particularly for unstructured meshes.

mpiannucci commented 3 weeks ago

Yeah so I think what Omar is trying to figure out is how to tell if this package is outputting accurate data. One way to do that is to check against how other packages do it, which is what this issue is about I think.

The other way is to make the calculations by "hand" based on the coordinates which are known outside this context, and then make sure that this library outputs a grid that matches.

That is probably the correct first approach.