Open Geosynopsis opened 5 years ago
Definitely! Looks like you have some great additions. It also adds opportunities for new vector/raster integrations.
So, I think a discussion for how best to integrate xgeo
and rioxarray
would be a good idea.
There are several different ways to proceed, so I will just do a brain dump on my initial thoughts:
xgeo
could be the geopandas
tool that uses the rioxarray
extension for rasterio-like functionality. With this approach, the accessor could be xgeo
or something to prevent conflicts with the geo
accessor planned for geoxarray
.
The benefits of this approach would be that it would make the dependency list targeted specifically to the use case and geopandas
could become required for xgeo
if needed while users of rioxarray
won't need the geopandas
package installed.
If this approach is taken, then we could discuss changes to the rio
extension API and updates in rioxarray
needed to make it useful for the geopandas additions in xgeo
. This would be useful to have a common engine.
https://github.com/corteva/geocube
Currently, the geocube toolset already has geopandas as a dependency. So, adding in an extension here wouldn't require any changes to the dependencies. However, the downsides are that the xgeo
extension would be buried in the geocube
code and it would add additional and unnecessary dependencies (such as datacube
).
Also, as a side note @Geosynopsis, I noticed that you have code for CF to CRS conversions.
This may be of interest to you: https://pyproj4.github.io/pyproj/v2.2.0rel/api/crs.html#pyproj.crs.CRS.to_cf https://pyproj4.github.io/pyproj/v2.2.0rel/api/crs.html#pyproj.crs.CRS.from_cf
I've never actually used geopandas
so I'm not sure of the overlap, but would having geopandas
as an optional dependency for geoxarray or rioxarray make sense?
My hope is to keep geoxarray fairly simple given how much work has been put in to pyproj
with handling CF conversion and WKT <-> PROJ4 <-> others. I was hoping it could have some resampling interface to rasterio or pyresample if needed. Overall I was thinking geoxarray would help manage how users define their geolocation information (crs coordinate, lons/lats 2D coordinates, x/y coordinates, etc) and help users get the information to be used elsewhere.
It sounds like we have three distinct, but not completely separate use cases (rasterio versus geopandas versus simple). Maybe this isn't the place, but @snowman2 do you see a reason to use rasterio's CRS object over pyproj's when assigning CRS information to a DataArray/Dataset?
I've never actually used geopandas so I'm not sure of the overlap, but would having geopandas as an optional dependency for geoxarray or rioxarray make sense?
geopandas
is a powerful interface for doing geospatial operations with vector/shape data, so it makes sense if you are interested in using shapefile data with raster data. But, it is quite a heavy dependency (adds fiona, shapely (GEOS), and rtree (libspatialindex) [optional], pyproj (PROJ) to the stack). I know you mention having it as a possible optional dependency, but I currently like the idea putting the geopandas-like functionality as it's own package as it clarifies the functionality of the package and the dependencies. Also, I am thinking that I currently like rioxarray
with the scope of rasterio-like funtionality with rasterio
and xarray
(and scipy
) as dependencies at the moment. It keeps the scope and functionality of the project in line with the project name and makes installation simpler (and hopefully less confusing). I may need to sleep on this one and see how I feel about it later as I may have some holes in my thinking :).
do you see a reason to use rasterio's CRS object over pyproj's when assigning CRS information to a DataArray/Dataset?
I think pyproj.CRS
has a simpler dependency list and has more features/functionality (it supports from_cf/to_cf
). The only thing to be careful about is that it defaults to WKT2
when exporting to_wkt
. I am not sure exactly what version of GDAL begins to support WKT2
, but rasterio
is currently limited to GDAL<3.0
for the time being.
My hope is to keep geoxarray fairly simple ...
That would definitiely be useful. With this thinking in mind, I am wondering if geoxarray
will be a standardizer for geospatial python/xarray packages? If it's dependencies are pretty small, maybe it could be a base for rioxarray
as far as retrieving CRS and other geospatial information and writing them back to the xarray dataset.
It would be nice if our three libraries (if they stay as 3) could use the same naming and object types for coordinate variables at least. I see crs
an issue especially with xarray's open_rasterio using rasterio's CRS object, but I feel like rasterio/gdal are really big dependencies to force on people.
I just saw your comment about geoxarray being a base for rioxarray (and possibly xgeo
). I think that would be the long term goal, but given how slow its been for me to get a real package out maybe we can only maintain similar naming for a collaboration in the future.
...Or you could propose changes to geoxarray to support what you need in rioxarray and we could release something?
Or you could propose changes to geoxarray to support what you need in rioxarray and we could release something
Sounds like a good idea. I will think on this.
@djhoese @snowman2 As far as I understood the initial motive, it would be great to consolidate the libraries if possible. If you see the design pattern of xarray itself, the xarray relies optionally on many libraries like dask or rasterio or netcdf. So IMO, it makes sense to consolidate the libraries providing the options to the users to tune the dependencies based on the operations they want to use. That way, we can have a combined workforce on maintaining a single library.
@Geosynopsis, you are indeed correct that we want to consolidate functionality where it make sense. You also bring up a good example of a well-organized project with optional dependencies.
I am currently thinking that combining these libraries could happen in stages. My initial thoughts are in stage 1 we can combine the pieces of xgeo
and rioxarray
that are rasterio-only into rioxarray
with keeping the design friendly for xgeo
. Then, xgeo
can use rioxarray
in its code base for the rasterio part. In stage 2, rioxarray
will update to use the geolocation management pieces from geoxarray
and xgeo
will get these updates for free.
After stage 2 is complete, I think we will be in a better place to decide if and how to better consolidate. In the end, it all comes down to what the scope of projects should be. Keeping them separate is also a good option and examples of doing so are in related xarray projects and django extensions.
That way, we can have a combined workforce on maintaining a single library.
I think either way we organize it we can have a combined workforce working towards the same goal.
But, I think having time to think about it would be a good idea too (at least for me :)).
@shaharkadmiel, I figured I should add you here for the discussion of collaboration with geo accessors. https://github.com/pydata/xarray/issues/3482 https://github.com/shaharkadmiel/rasterx
Hey @snowman2, I have also been playing around with the xarray for geospatial fuctionality as well which you can access at xgeo. As rightfully pointed out by @djhoese in xarray thread 2228, may be we can collaborate together.