GPlates / gplately

GPlately is a Python package to interrogate tectonic plate reconstructions.
https://gplates.github.io/gplately/
GNU General Public License v2.0
58 stars 13 forks source link

the problem with the default extent parameter in gplately.grids.write_netcdf_grid #279

Open michaelchin opened 3 weeks ago

michaelchin commented 3 weeks ago

follow up the AuScope meeting yesterday(2024-09-24)

I may misunderstood the requirements. I just wrote down what I believed was the issue.

I believed that NW would like to preserve the grid extent info from reading in the grid file, and then when writing out the output grid file, the preserved extent info should be applied automatically.

Alternatively, the easy and dirty solution will be just remove the default value [-180,180,-90,90] and force the clients to pay attention.

This matter needs to be thoroughly discussed before coming up with an acceptable solution for our clients.

https://gplates.github.io/gplately/v1.3.0/grids.html#gplately.grids.write_netcdf_grid

jcannon-gplates commented 3 weeks ago

remove the default value [-180,180,-90,90] and force the clients to pay attention

I agree. And have a default of None, meaning the client doesn't have to provide it for specific input grid types - it could instead be obtained from the grid object - I think Ben mentioned possibly allowing an xarray type as the grid.

Also, I don't know if read_netcdf_grid could return a Raster object (which write_netcdf_grid could accept as one of its many allowed input grid types). Although that would make it a little harder to use read_netcdf_grid since it's no longer returning a NumPy array. Alternatively it could return a thin wrapper around the returned grid_z, lon, lat NumPy arrays (and write_netcdf_grid can also accept that wrapper). So the only situation where extent needs specifying is when grid just contains z values.

michaelchin commented 2 weeks ago

@nickywright

just in case NW might like to know the existence of this discussion.

nickywright commented 2 weeks ago

Hi @michaelchin , I'd like the grid extent to be whatever the grid actually is rather than me specifying it. e.g., if my grid is [0,360,-90,90], then it should be that, but if it's [100,180,-50,0] (i.e. not a global netcdf) it should also be that.

I've just had a quick look at the actual write_grid_function and it uses the extent to create regularly spaced the lats and lon - but there is absolutely no reason netcdfs have to have regularly spaced lats and lon coordinates (in fact, I have dealt with many irregularly spaced lat/lon grids before). And it assumes (I think? but I could be wrong) a grid-registration type? But half the grids I deal with are pixel registered (and I thought the default in xarray is pixel registered). So really, what I would like is the actual coordinates of my grids (the lats and lons of my raster) to be written out.

michaelchin commented 2 weeks ago

Hi @michaelchin , I'd like the grid extent to be whatever the grid actually is rather than me specifying it. e.g., if my grid is [0,360,-90,90], then it should be that, but if it's [100,180,-50,0] (i.e. not a global netcdf) it should also be that.

I've just had a quick look at the actual write_grid_function and it uses the extent to create regularly spaced the lats and lon - but there is absolutely no reason netcdfs have to have regularly spaced lats and lon coordinates (in fact, I have dealt with many irregularly spaced lat/lon grids before). And it assumes (I think? but I could be wrong) a grid-registration type? But half the grids I deal with are pixel registered (and I thought the default in xarray is pixel registered). So really, what I would like is the actual coordinates of my grids (the lats and lons of my raster) to be written out.

@GPlates/gplately-dev Any ideas about this?

jcannon-gplates commented 2 weeks ago

So really, what I would like is the actual coordinates of my grids (the lats and lons of my raster) to be written out.

I think having read_netcdf_grid return something like a thin wrapper around its grid_z, lon, lat (that write_netcdf_grid also accepts as input, among other types like xarray) should help here. Maybe return a class with those attributes (and ignore the return_grids argument of read_netcdf_grid since now always returning lat/lon coords). Then write_netcdf_grid can write out the actual lon/lat coordinates (and find their min/max range).

And if we're worried about backward compatibility then could add an argument to read_netcdf_grid that defaults to returning what it currently returns (in which case it would respect the return_grids argument), but user can explicitly specify the new argument to instead return the new thin wrapper - which can then get passed to write_netcdf_grid. But the problem with backward compatibility is new users won't get the new desired behaviour by default - in which case maybe we just break old users' code, forcing them to update it. I guess we're already breaking their code if we have extent default to None (because old code is likely passing only grid_z from read_netcdf_grid to write_netcdf_grid, which requires extent to be specified).

I hope that wasn't confusing. I think I just confused myself :wink:

And it assumes (I think? but I could be wrong) a grid-registration type?

I'm not sure the internal details, but could have an extra argument (to write_netcdf_grid) to specify pixel-vs-grid registration. But could default to None such that each input type has its own default. For example, xarray could default to pixel registered (or whatever that library defaults to), and a regular z/lon/lat input could default to grid registered.

brmather commented 2 weeks ago

Hi team, a lot to unpack here. I'm going to use bold font to try and structure my thoughts on this.

I think the read_netcdf_grid function is fine as it is, since the user can already specify the lon, lat coordinates to be returned via the return_grids=True flag. If you pass the filename to gplately.Raster(path/to/grid.nc) then it returns a Raster object negating the need for an additional wrapper on the read_netcdf_grid output.

The write_netcdf_grid function could probably use some work. Perhaps it would help if there were an option for the user to supply the lon, lat coordinates. For example, matplotlib.pyplot.pcolormesh will accept a single 2D array or a list of coordinates and their data values:

pcolormesh([X, Y,] C, **kwargs)

So perhaps we can do something similar here,

write_netcdf_grid(filename, [lons, lats], grid, extent=None, significant_digits=None, fill_value=None)

If lons, lats are not provided, then it looks for an extent, if no extent is provided then it either assumes a global extent or fails. Grid or pixel registration can be an extra argument (or maybe this can be determined from the size of lon, lat arrays? Not sure.)

How does that sound?

jcannon-gplates commented 2 weeks ago

That all sounds good to me 👍. I like your symmetry between read_netcdf_grid and write_netcdf_grid (regarding the optional lons/lats).

I think I was trying to make the default case have the lons/lats passed from read_netcdf_grid to write_netcdf_grid. But you're right, the user can still do that - they just need to specify return_grids=True. And, as you noted, there's also Raster (with its save_to_netcdf4 updated to match the new write_netcdf_grid).

Interesting trick they do with pcolormesh([X, Y,] C, **kwargs) to enable optional [X, Y] to be before required C - I've not seen that ordering before - looks like the actual signature is pcolormesh(*args, **kwargs) and they manually parse args.

michaelchin commented 1 week ago

Please @ me if this needs my attention. I am overwhelmed by other things right now. And people may get furious if I missed anything. I don't want to upset anyone... I am a nice guy...(well, this is another joke attempt while wind is blowing in the east. )

michaelchin commented 1 week ago

@nickywright I am not sure if @jcannon-gplates and @brmather 's solution is satisfactory for you. I don't have time to look into this issue right now. I may find time later to deal with this issue though. If you need my immediate attention with this issue, please @ me and let me know. Thanks.

nickywright commented 3 days ago

Hi @michaelchin.

I don't read in the grids using the read_netcdf_grid since you don't need to for some things so I can't comment on the symmetry between the functions, but I think this is ok,

For reference, I've been doing something like:

raster = gplately.Raster(data='aaa.nc', time=15, plate_reconstruction=M08_rotation_model)
reconstructed_raster = raster.reconstruct(0, threads=4, partitioning_features=M08_static_polygons)

# write out output:
# reconstructed_raster.save_to_netcdf4(outfile)   # old way?
gplately.grids.write_netcdf_grid(filename=outfile, grid=reconstructed_raster._data, extent=[reconstructed_raster._lons.min(), reconstructed_raster._lons.max(), reconstructed_raster._lats.min(), reconstructed_raster._lats.max()])

So I think this fairly similar, I would just specify the lat/lon arrays directly (which is safer than the min/max in the extent IMO).

michaelchin commented 3 days ago

Hi @michaelchin.

I don't read in the grids using the read_netcdf_grid since you don't need to for some things so I can't comment on the symmetry between the functions, but I think this is ok,

For reference, I've been doing something like:

raster = gplately.Raster(data='aaa.nc', time=15, plate_reconstruction=M08_rotation_model)
reconstructed_raster = raster.reconstruct(0, threads=4, partitioning_features=M08_static_polygons)

# write out output:
# reconstructed_raster.save_to_netcdf4(outfile)   # old way?
gplately.grids.write_netcdf_grid(filename=outfile, grid=reconstructed_raster._data, extent=[reconstructed_raster._lons.min(), reconstructed_raster._lons.max(), reconstructed_raster._lats.min(), reconstructed_raster._lats.max()])

So I think this fairly similar, I would just specify the lat/lon arrays directly (which is safer than the min/max in the extent IMO).

Thanks @nickywright

This is very helpful.

I agree with you that the GPlately should be able to get the grid extent from aaa.nc and use this extent to save the new grid file.