Closed benbovy closed 1 month ago
Yes I've been working further on this (last updates not uploaded yet) but that might be too much of a change indeed. I'll restrict this PR to the refactoring of Geography wrapper classes.
Also I'm not exactly sure whether we should closely reuse Shapely's API for creation functions. I find that API convoluted with the mixing of a unique coords
argument vs. separate x
, y
(and z
) arguments as well as the different behavior depending on indices
. We should probably reuse the same API for compatibility although it makes things more cumbersome to implement.
Yeah, the vectorized creation functions in shapely are quite complex, and I am also not sure if they are always that useful (like creating polygons from a single 3D array, which I think in practice is only useful to generate some random triangles or to create bboxes), and agreed we should carefully think about which API we exactly want here. That's also the reason I mentioned leaving out the vectorized creation functions for now, and only providing the scalars ones (that already exist and that are used in the tests). For points, I would keep the vectorized function you already have, since in this case it is quite simple, I think? (and you already have that)
For points, I would keep the vectorized function you already have, since in this case it is quite simple, I think? (and you already have that)
Yes. For a single array this is currently limited to 2D (i.e., not a true ufunc like in shapely) but that's fine for now. I don't think that we can use pybind11.vectorize
to extend support to a single nD input array (https://github.com/pybind/pybind11/issues/1294).
I wouldn't be too hard I think to mimic the behavior of a ufunc by relying on pybind11's array view & indexing capabilities (+ some manual computation of nD indices), though. That would be even easier to let xtensor(-python) takes care of all of that (also for creating temporary arrays of geographies on the C++ side while releasing the GIL), but I'm not sure if adding such build dependencies is worth it.
Nice progress!
One thought on the API: given that we don't follow exact shapely anymore with the subclass constructors (shapely.Point(..)
etc), we could also follow the naming of s2 and bigquery and others for the scalar creation functions, and use something like make_point()
instead of point()
, etc.
(although I see they are also not consistent in it, as they have st_geogpoint
and st_makeline
/st_makepolygon
...)
(I am not necessarily convinced myself that this is better (it's longer to type, but tab completion will put them all together, and there is a clearer distinction from being a type), mostly just mentioning it in case you would have a stronger opinion)
Thanks for the review comments @jorisvandenbossche.
I also have been thinking about the naming... Likewise I'm still not fully convinced myself and I don't have strong opinions either.
We could also use the create_
prefix for what we call "geography creation" functions...
This is getting close. I'd just like to investigate the segfault when creating nested geography collections before making this PR ready (EDIT: fixed segfault).
A few other comments that we can address later:
spherely.Geography
wrapper, but it depends on what we want to do.GeographyType::GeographyCollection
but both the Geography repr and WKT have the GEOMETRYCOLLECTION
name. Do we want to uniformize things and rename GeographyType::GeographyCollection
to GeographyType::GeometryCollection
?s2geography::PolygonGeography
reuse the same underlying S2Polygon
for multi-polygons. Maybe if we deactivate s2 normalization and/or validity check it will pass but will it work ?point()
, linestring()
, etc. or create_point()
, create_linestring()
, etc.This is getting close.
A few other comments that we can address later:
Agreed! Let's indeed get this in (to have the basics working, and will also be easier for other PRs adding features), and the things below can be handled in follow-ups.
- Empty geographies: there is no overload function to create MULTI geographies without argument (or with None) ; also when empty or 1-length sequences are given the multi-types are demoted (see Singleton Multi-types automatically demoted paleolimbot/s2geography#17)
Personally I am fine with that limitation for now, especially given that s2geography
essentially does not make the distinction between single and multi geometries anyway (as you linked), and that it is only for output (like WKT, or asking for the type) that the distinction is introduced, based on the number of elements stored in the object.
- The collection type is named
GeographyType::GeographyCollection
but both the Geography repr and WKT have theGEOMETRYCOLLECTION
name. Do we want to uniformize things and renameGeographyType::GeographyCollection
toGeographyType::GeometryCollection
?
I would maybe lean towards GeographyType::GeometryCollection
, although that is certainly odd as well ..
Another case where this occurs is in the spherely.geography_collection(..)
constructor. For that case, we could also shorten it to just collection()
to avoid that issue? (or at least when we add a prefix like create_..
, that might be sufficient)
- It is not currently possible to create a MULTIPOLYGON with duplicate rings, due to the fact that
s2geography::PolygonGeography
reuse the same underlyingS2Polygon
for multi-polygons. Maybe if we deactivate s2 normalization and/or validity check it will pass but will it work ?
A multipolygon with duplicate rings also sounds like "wrong" .. (in shapely / GEOS, that would be an invalid geometry anyway, and thus give unexpected results), so I am personally fine with not allowing to create it.
Although I suppose at some point we should allow to create invalid geographies, end then have a method to check if they are valid / what the reason is for the invalidity (for example s2_make_polygon
has a check
keyword to disable validation when constructing a polygon)
This is also another case (together with the geometrycollection / geographycollection above) where it points the the mismatch between the "simple features" model we try to emulate and how s2 works. For the constructor functions you implemented here, it is nice to make the distinction between all the types, I think (as it gives a more user friendly interface), but for accessors like the geography type, we could also stay more closely to s2geography and have a function that just returns point/line/polygon/collection
- Scalar creation function naming: still no strong opinion, although I would lean towards either
point()
,linestring()
, etc. orcreate_point()
,create_linestring()
, etc.
Let's open a separate issue for this and see if we can get some more opinions
I would maybe lean towards GeographyType::GeometryCollection, although that is certainly odd as well ..
Another case where this occurs is in the spherely.geography_collection(..) constructor. For that case, we could also shorten it to just collection() to avoid that issue?
I renamed GeographyType::GeographyCollection
-> GeographyType::GeometryCollection
and geography_collection()
to collection()
, I also think it looks nicer for now.
OK, merged this, so we can move forward with other PRs and follow-ups. Thanks @benbovy!
👍 thanks for the reviews @jorisvandenbossche !
This PR refactors the Geography (sub)classes in spherely and adds geography creation scalar functions.
Supersedes #26. Closes #41. Closes #27.
Geography
wrapper class remainsTODO:
multipolygon
creation functiontest_creation.py