benbovy / spherely

Manipulation and analysis of geometric objects on the sphere.
https://spherely.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
119 stars 8 forks source link

Refactor Geography and add (scalar) creation functions #51

Closed benbovy closed 1 month ago

benbovy commented 1 month ago

This PR refactors the Geography (sub)classes in spherely and adds geography creation scalar functions.

Supersedes #26. Closes #41. Closes #27.

TODO:

benbovy commented 1 month ago

Yes I've been working further on this (last updates not uploaded yet) but that might be too much of a change indeed. I'll restrict this PR to the refactoring of Geography wrapper classes.

Also I'm not exactly sure whether we should closely reuse Shapely's API for creation functions. I find that API convoluted with the mixing of a unique coords argument vs. separate x, y (and z) arguments as well as the different behavior depending on indices. We should probably reuse the same API for compatibility although it makes things more cumbersome to implement.

jorisvandenbossche commented 1 month ago

Yeah, the vectorized creation functions in shapely are quite complex, and I am also not sure if they are always that useful (like creating polygons from a single 3D array, which I think in practice is only useful to generate some random triangles or to create bboxes), and agreed we should carefully think about which API we exactly want here. That's also the reason I mentioned leaving out the vectorized creation functions for now, and only providing the scalars ones (that already exist and that are used in the tests). For points, I would keep the vectorized function you already have, since in this case it is quite simple, I think? (and you already have that)

benbovy commented 1 month ago

For points, I would keep the vectorized function you already have, since in this case it is quite simple, I think? (and you already have that)

Yes. For a single array this is currently limited to 2D (i.e., not a true ufunc like in shapely) but that's fine for now. I don't think that we can use pybind11.vectorize to extend support to a single nD input array (https://github.com/pybind/pybind11/issues/1294).

I wouldn't be too hard I think to mimic the behavior of a ufunc by relying on pybind11's array view & indexing capabilities (+ some manual computation of nD indices), though. That would be even easier to let xtensor(-python) takes care of all of that (also for creating temporary arrays of geographies on the C++ side while releasing the GIL), but I'm not sure if adding such build dependencies is worth it.

jorisvandenbossche commented 1 month ago

Nice progress! One thought on the API: given that we don't follow exact shapely anymore with the subclass constructors (shapely.Point(..) etc), we could also follow the naming of s2 and bigquery and others for the scalar creation functions, and use something like make_point() instead of point(), etc. (although I see they are also not consistent in it, as they have st_geogpoint and st_makeline/st_makepolygon ...)

(I am not necessarily convinced myself that this is better (it's longer to type, but tab completion will put them all together, and there is a clearer distinction from being a type), mostly just mentioning it in case you would have a stronger opinion)

benbovy commented 1 month ago

Thanks for the review comments @jorisvandenbossche.

I also have been thinking about the naming... Likewise I'm still not fully convinced myself and I don't have strong opinions either.

We could also use the create_ prefix for what we call "geography creation" functions...

benbovy commented 1 month ago

This is getting close. I'd just like to investigate the segfault when creating nested geography collections before making this PR ready (EDIT: fixed segfault).

A few other comments that we can address later:

jorisvandenbossche commented 1 month ago

This is getting close.

A few other comments that we can address later:

Agreed! Let's indeed get this in (to have the basics working, and will also be easier for other PRs adding features), and the things below can be handled in follow-ups.

Personally I am fine with that limitation for now, especially given that s2geography essentially does not make the distinction between single and multi geometries anyway (as you linked), and that it is only for output (like WKT, or asking for the type) that the distinction is introduced, based on the number of elements stored in the object.

  • The collection type is named GeographyType::GeographyCollection but both the Geography repr and WKT have the GEOMETRYCOLLECTION name. Do we want to uniformize things and rename GeographyType::GeographyCollection to GeographyType::GeometryCollection?

I would maybe lean towards GeographyType::GeometryCollection, although that is certainly odd as well ..

Another case where this occurs is in the spherely.geography_collection(..) constructor. For that case, we could also shorten it to just collection() to avoid that issue? (or at least when we add a prefix like create_.., that might be sufficient)

  • It is not currently possible to create a MULTIPOLYGON with duplicate rings, due to the fact that s2geography::PolygonGeography reuse the same underlying S2Polygon for multi-polygons. Maybe if we deactivate s2 normalization and/or validity check it will pass but will it work ?

A multipolygon with duplicate rings also sounds like "wrong" .. (in shapely / GEOS, that would be an invalid geometry anyway, and thus give unexpected results), so I am personally fine with not allowing to create it.

Although I suppose at some point we should allow to create invalid geographies, end then have a method to check if they are valid / what the reason is for the invalidity (for example s2_make_polygon has a check keyword to disable validation when constructing a polygon)

This is also another case (together with the geometrycollection / geographycollection above) where it points the the mismatch between the "simple features" model we try to emulate and how s2 works. For the constructor functions you implemented here, it is nice to make the distinction between all the types, I think (as it gives a more user friendly interface), but for accessors like the geography type, we could also stay more closely to s2geography and have a function that just returns point/line/polygon/collection

  • Scalar creation function naming: still no strong opinion, although I would lean towards either point(), linestring(), etc. or create_point(), create_linestring(), etc.

Let's open a separate issue for this and see if we can get some more opinions

benbovy commented 1 month ago

I would maybe lean towards GeographyType::GeometryCollection, although that is certainly odd as well ..

Another case where this occurs is in the spherely.geography_collection(..) constructor. For that case, we could also shorten it to just collection() to avoid that issue?

I renamed GeographyType::GeographyCollection -> GeographyType::GeometryCollection and geography_collection() to collection(), I also think it looks nicer for now.

jorisvandenbossche commented 1 month ago

OK, merged this, so we can move forward with other PRs and follow-ups. Thanks @benbovy!

benbovy commented 1 month ago

👍 thanks for the reviews @jorisvandenbossche !