corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
529 stars 85 forks source link

Regarding the colaboration on the library #4

Open Geosynopsis opened 5 years ago

Geosynopsis commented 5 years ago

Hey @snowman2, I have also been playing around with the xarray for geospatial fuctionality as well which you can access at xgeo. As rightfully pointed out by @djhoese in xarray thread 2228, may be we can collaborate together.

snowman2 commented 5 years ago

Definitely! Looks like you have some great additions. It also adds opportunities for new vector/raster integrations.

So, I think a discussion for how best to integrate xgeo and rioxarray would be a good idea.

There are several different ways to proceed, so I will just do a brain dump on my initial thoughts:

1. xgeo uses rioxarray as engine

xgeo could be the geopandas tool that uses the rioxarray extension for rasterio-like functionality. With this approach, the accessor could be xgeo or something to prevent conflicts with the geo accessor planned for geoxarray.

The benefits of this approach would be that it would make the dependency list targeted specifically to the use case and geopandas could become required for xgeo if needed while users of rioxarray won't need the geopandas package installed.

If this approach is taken, then we could discuss changes to the rio extension API and updates in rioxarray needed to make it useful for the geopandas additions in xgeo. This would be useful to have a common engine.

2. Add the geopandas functionality into geocube

https://github.com/corteva/geocube

Currently, the geocube toolset already has geopandas as a dependency. So, adding in an extension here wouldn't require any changes to the dependencies. However, the downsides are that the xgeo extension would be buried in the geocube code and it would add additional and unnecessary dependencies (such as datacube).

3-? .... Other ideas welcome

snowman2 commented 5 years ago

Also, as a side note @Geosynopsis, I noticed that you have code for CF to CRS conversions.

This may be of interest to you: https://pyproj4.github.io/pyproj/v2.2.0rel/api/crs.html#pyproj.crs.CRS.to_cf https://pyproj4.github.io/pyproj/v2.2.0rel/api/crs.html#pyproj.crs.CRS.from_cf

djhoese commented 5 years ago

I've never actually used geopandas so I'm not sure of the overlap, but would having geopandas as an optional dependency for geoxarray or rioxarray make sense?

My hope is to keep geoxarray fairly simple given how much work has been put in to pyproj with handling CF conversion and WKT <-> PROJ4 <-> others. I was hoping it could have some resampling interface to rasterio or pyresample if needed. Overall I was thinking geoxarray would help manage how users define their geolocation information (crs coordinate, lons/lats 2D coordinates, x/y coordinates, etc) and help users get the information to be used elsewhere.

It sounds like we have three distinct, but not completely separate use cases (rasterio versus geopandas versus simple). Maybe this isn't the place, but @snowman2 do you see a reason to use rasterio's CRS object over pyproj's when assigning CRS information to a DataArray/Dataset?

snowman2 commented 5 years ago

I've never actually used geopandas so I'm not sure of the overlap, but would having geopandas as an optional dependency for geoxarray or rioxarray make sense?

geopandas is a powerful interface for doing geospatial operations with vector/shape data, so it makes sense if you are interested in using shapefile data with raster data. But, it is quite a heavy dependency (adds fiona, shapely (GEOS), and rtree (libspatialindex) [optional], pyproj (PROJ) to the stack). I know you mention having it as a possible optional dependency, but I currently like the idea putting the geopandas-like functionality as it's own package as it clarifies the functionality of the package and the dependencies. Also, I am thinking that I currently like rioxarray with the scope of rasterio-like funtionality with rasterio and xarray (and scipy) as dependencies at the moment. It keeps the scope and functionality of the project in line with the project name and makes installation simpler (and hopefully less confusing). I may need to sleep on this one and see how I feel about it later as I may have some holes in my thinking :).

do you see a reason to use rasterio's CRS object over pyproj's when assigning CRS information to a DataArray/Dataset?

I think pyproj.CRS has a simpler dependency list and has more features/functionality (it supports from_cf/to_cf). The only thing to be careful about is that it defaults to WKT2 when exporting to_wkt. I am not sure exactly what version of GDAL begins to support WKT2, but rasterio is currently limited to GDAL<3.0 for the time being.

snowman2 commented 5 years ago

My hope is to keep geoxarray fairly simple ...

That would definitiely be useful. With this thinking in mind, I am wondering if geoxarray will be a standardizer for geospatial python/xarray packages? If it's dependencies are pretty small, maybe it could be a base for rioxarray as far as retrieving CRS and other geospatial information and writing them back to the xarray dataset.

djhoese commented 5 years ago

It would be nice if our three libraries (if they stay as 3) could use the same naming and object types for coordinate variables at least. I see crs an issue especially with xarray's open_rasterio using rasterio's CRS object, but I feel like rasterio/gdal are really big dependencies to force on people.

I just saw your comment about geoxarray being a base for rioxarray (and possibly xgeo). I think that would be the long term goal, but given how slow its been for me to get a real package out maybe we can only maintain similar naming for a collaboration in the future.

...Or you could propose changes to geoxarray to support what you need in rioxarray and we could release something?

snowman2 commented 5 years ago

Or you could propose changes to geoxarray to support what you need in rioxarray and we could release something

Sounds like a good idea. I will think on this.

Geosynopsis commented 5 years ago

@djhoese @snowman2 As far as I understood the initial motive, it would be great to consolidate the libraries if possible. If you see the design pattern of xarray itself, the xarray relies optionally on many libraries like dask or rasterio or netcdf. So IMO, it makes sense to consolidate the libraries providing the options to the users to tune the dependencies based on the operations they want to use. That way, we can have a combined workforce on maintaining a single library.

snowman2 commented 5 years ago

@Geosynopsis, you are indeed correct that we want to consolidate functionality where it make sense. You also bring up a good example of a well-organized project with optional dependencies.

I am currently thinking that combining these libraries could happen in stages. My initial thoughts are in stage 1 we can combine the pieces of xgeo and rioxarray that are rasterio-only into rioxarray with keeping the design friendly for xgeo. Then, xgeo can use rioxarray in its code base for the rasterio part. In stage 2, rioxarray will update to use the geolocation management pieces from geoxarray and xgeo will get these updates for free.

After stage 2 is complete, I think we will be in a better place to decide if and how to better consolidate. In the end, it all comes down to what the scope of projects should be. Keeping them separate is also a good option and examples of doing so are in related xarray projects and django extensions.

That way, we can have a combined workforce on maintaining a single library.

I think either way we organize it we can have a combined workforce working towards the same goal.

But, I think having time to think about it would be a good idea too (at least for me :)).

snowman2 commented 5 years ago

@shaharkadmiel, I figured I should add you here for the discussion of collaboration with geo accessors. https://github.com/pydata/xarray/issues/3482 https://github.com/shaharkadmiel/rasterx