Closed snowman2 closed 3 months ago
Thanks @justingruca :+1:
@snowman2 Sorry, long midsummer week-end here in Sweden, so I couldn't check this in time. This PR is actually introducing backwards incompatible changes as the gcps are now a string (dumped json) instead of a geojson dictionary. We could discuss which is the best, but we should at least have a deprecation cycle for this change, right?
This change actually breaks our code which expects gcps a dictionary. Looks like the test here was changed just for that purpose... https://github.com/corteva/rioxarray/pull/787/files#diff-11692352ce364dc8d376f73717ca1390e291d39a8f8de80526ada58ccdf5187cR2990
This change actually breaks our code which expects gcps a dictionary.
Apologies for that. I didn't realize you were reading the GCPs directly from the file instead of through the rioxarray interface. To help me better understand your use case, why doesn't the code use rio.get_gcps()
?
This PR is actually introducing backwards incompatible changes as the gcps are now a string (dumped json) instead of a geojson dictionary. We could discuss which is the best, but we should at least have a deprecation cycle for this change, right?
Yes, a deprecation cycle would have been better. I just didn't foresee any potential breakages and figured it would be okay to proceed with the change :upside_down_face:
@mraspaud are there file formats you can write a dict to?
We are using the "gcps" attribute of the "spatial_ref" coordinate, not the get_gcps
function, see here https://github.com/pytroll/satpy/blob/df4a1baef4594a0fa8b84f49fea2d7f35e37b979/satpy/readers/sar_c_safe.py#L667
And it's not about the writing, I totally see the point of writing the serialised json to file.
But the type of that attribute changed. Maybe the serialisation should happen later in the chain?
Maybe the serialisation should happen later in the chain?
There currently aren't any hooks to make that automagically happen before the .to_netcdf
call.
Would there be any issues changing this line:
gcps = self._data.coords["spatial_ref"].attrs["gcps"]
to:
gcps = self._data.rio.get_gcps()
Would need to change this:
gcps = self._data.coords["spatial_ref"].attrs["gcps"]
crs = self._data.rio.crs
gcp_list = [(feature["properties"]["row"], feature["properties"]["col"], *feature["geometry"]["coordinates"])
for feature in gcps["features"]]
gcp_array = np.array(gcp_list)
ypoints = np.unique(gcp_array[:, 0])
xpoints = np.unique(gcp_array[:, 1])
gcp_lons = gcp_array[:, 2].reshape(ypoints.shape[0], xpoints.shape[0])
gcp_lats = gcp_array[:, 3].reshape(ypoints.shape[0], xpoints.shape[0])
gcp_alts = gcp_array[:, 4].reshape(ypoints.shape[0], xpoints.shape[0])
rio_gcps = [rasterio.control.GroundControlPoint(*gcp) for gcp in gcp_list]
return (xpoints, ypoints), (gcp_lons, gcp_lats, gcp_alts), (rio_gcps, crs)
to:
gcps = self._data.rio.get_gcps()
crs = self._data.rio.crs
gcp_array = np.array([gcp.row, gcp.col, gcp.x, gcp.y, gcp.z for gcp in gcps])
ypoints = np.unique(gcp_array[:, 0])
xpoints = np.unique(gcp_array[:, 1])
gcp_lons = gcp_array[:, 2].reshape(ypoints.shape[0], xpoints.shape[0])
gcp_lats = gcp_array[:, 3].reshape(ypoints.shape[0], xpoints.shape[0])
gcp_alts = gcp_array[:, 4].reshape(ypoints.shape[0], xpoints.shape[0])
return (xpoints, ypoints), (gcp_lons, gcp_lats, gcp_alts), (gcps, crs)
Would there be any issues changing this line:
gcps = self._data.coords["spatial_ref"].attrs["gcps"]
to:
gcps = self._data.rio.get_gcps()
In principle, yes, it is possible.
However, the idea was to abstract from the underlying type and thus use the more interoperable geojson instead. That's why, when I implemented the gcp support, I made sure the geojson dictionary was available through the coordinate attribute.
But at the same time, I understand that this makes it difficult to have in a DataArray, as xarray doesn't know how to serialize dicts, and doesn't provide a mechanism for custom encoding of attributes.
As a compromise, would it be possible to implement a get_gcps as returning the geojson dict instead, and maybe have an alternative get_gdal_gcps to return the underlying gcp objects?
Regarding this PR, do you think we could reverse it for now, until the other functionalities are implemented?
First of all, I apologize for the trouble of this MR. I would handle it differently if I could go back. Now that it is added, I hesitate to revert it for the reasons below.
To help the discussion, I wanted to highlight two elements that are part of the design philosophy of rioxarray:
Storing the information as a dictionary inside of spatial_ref
breaks the first element. I didn't catch that when you originally added this and was surprised when it came up in #778.
Modifying get_gcps
to return a GeoJSON object breaks the second element. It will make it harder to pass GCPs around to the rasterio methods.
One workaround could be to do:
if isintance(gcps, str):
gcps = json.loads(gcps)
Then, could cache that elsewhere.
Alternatively, get_gcps
could be modified to check for a string/dict and you could write gcps
as a dict after the initial load.
As someone who is mildly effected by this PR (Martin's Satpy code failing in Satpy CI), I unfortunately for Martin think I agree with @snowman2. One "hack" we've benefited from in some places of Satpy is putting python objects or complex structures inside of Xarray objects' attrs
or coords
. I/we found out late in Satpy's development that it is generally discouraged or even not supported by Xarray to store things this way and should be avoided if at all possible. I remember an Xarray PR a while ago that took at least one xarray release cycle to get fixed because we were using Xarray in a way the xarray maintainers did not anticipate.
So I think the rioxarray philosophy element 1 is inline with this idea of avoiding complex structures (or python objects) inside Xarray objects.
One workaround could be to do:
if isintance(gcps, str): gcps = json.loads(gcps)
Then, could cache that elsewhere. Alternatively,
get_gcps
could be modified to check for a string/dict and you could writegcps
as a dict after the initial load.
Yes, I suppose that's the logical solution to keep gdal/rasterio encapsulated.
I'll try to make time before my holiday next week to change this, and to make a new release of satpy.
@snowman2 thanks for explaining the guiding principles of rioxarray, let's hope I can monitor my github notifications better in the future not to miss any urgent pings :)
Hopefully this won't be a regular occurrence 😅
docs/history.rst
for all changes anddocs/rioxarray.rst
for new APINote: No need to be backwards compatible because writing to netCDF didn't work previously and all other scenarios would have been in-memory only or to GeoTiff, which should work with this change.
@mraspaud