benbovy / spherely

Manipulation and analysis of geometric objects on the sphere.
https://spherely.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
118 stars 8 forks source link

Exposing S2 / s2geography options (for boolean predicates / building operations) #70

Open jorisvandenbossche opened 3 days ago

jorisvandenbossche commented 3 days ago

The spatial predicates already use those options, but at the moment we only set them under the hood and don't allow to override them by the user. In this case, s2geography's functions directly allow you to specify the s2geometry S2BooleanOperation::Options

Those S2BooleanOperation options allow you to set the snap function , the polygon/polyline model (open/closed), and a bunch of other things. And for example we implement covers by calling contains with a "closed" model.

The boolean (overlay, i.e. union/intersection/difference) operations I am adding in https://github.com/benbovy/spherely/pull/62 also have a somewhat similar (super)set of options. In this case, s2geography defines a GlobalOptions options class (which includes the S2BooleanOperation::Options)
Those options allow you, among other things, to specify whether the output should contain points/polylines/polygons (for example, allowing to limit the intersection of polygons to only return the polygonal intersections).

So for both the predicates and the overlays we need a way to allow the spherely user to customize those options.


In the R s2 package, there is a s2_options() object that gathers all those options (https://r-spatial.github.io/s2/reference/s2_options.html) and that can be passed in those places (https://r-spatial.github.io/s2/reference/s2_contains.html, https://r-spatial.github.io/s2/reference/s2_boundary.html)

(the C++ side of this s2_options() object: https://github.com/r-spatial/s2/blob/main/src/s2-options.h)

jorisvandenbossche commented 3 days ago

@paleolimbot based on your experience with R s2 and s2geography, what is your current thought about the approach in the R package?

Or is the s2geography::GlobalOptions already a kind of port of what you have in the R package to s2geography? It seems that effectively the R GeographyOperationOptions class is used (at the C++ level interacting with s2geometry) to get either the s2geography::GlobalOptions (in boolean operations) or the S2BooleanOperation::Options (in predicates)

paleolimbot commented 3 days ago

The relevant R concept is s2_options(): https://r-spatial.github.io/s2/reference/s2_options.html (sorry, I just read that you found that already 😬 )

This is kind of a mishmash of overlay construction options, boolean operation options, and a few other things. I'm not sure squishing all of those into the same concept was a great idea because it's not clear exactly which options are relevant to each function. It would probably be a good idea to separate those at the C++ level too in any place where I didn't, but it also doesn't matter as much there.

In general, we found exposing all of those options to be totally essential to working around user problems with various corner cases! (Mostly: fixing invalid or weirdly defined geometries).

paleolimbot commented 3 days ago

s2 in R is also less of a user-facing package and more of a front for sf's mapping of simple features on to s2, which are not the s2 defaults in most cases. So it was more essential that we had all the options exposed so that Edzer could experiment (rather than have a great user experience toying with advanced non-default options). Wrapping all of them up in to one place made it slightly easier to include them everywhere.

paleolimbot commented 3 days ago

Also, quite a few new boolean operation options since I last looked!

https://github.com/google/s2geometry/blob/master/src/s2/s2boolean_operation.h#L281

benbovy commented 1 day ago

I'm not sure squishing all of those into the same concept was a great idea because it's not clear exactly which options are relevant to each function.

Wrapping all of them up in to one place made it slightly easier to include them everywhere.

I agree with both :).

Global options are useful if we want to make sure those are consistently applied over a whole processing pipeline (e.g., snap).

One way to support both in spherely (in a rather pythonic way?) could be to:

  1. expose the relevant options to each function separately as keyword arguments (maybe exposed as generic **kwargs in the function signature and described under the Other Parameters section in the docstrings to avoid clutter, if the same options end up being repeated all over the place)
  2. expose those options globally via a lightweight class that implements the context management protocol (maybe with some namespace if needed). Options set globally would always be overridden by options passed as function keyword arguments.

We could start with 1 and then implement 2 later.

jorisvandenbossche commented 22 hours ago

This is kind of a mishmash of overlay construction options, boolean operation options, and a few other things. I'm not sure squishing all of those into the same concept was a great idea because it's not clear exactly which options are relevant to each function.

On the other hand, most of the options that are in the R s2_options() are in the s2geography::GlobalOptions, and that are essentially the options needed for the boolean operations (overlays, i.e. unions/intersection/..) ? Do you also think the GlobalOptions in s2geography is too much of a mishmash? Or is that OK?

  1. expose the relevant options to each function separately as keyword arguments

In general I am more a fan of separate keyword arguments (potentially through kwargs if they are repeated in many places) compared to "option classes" (in pyarrow we also make use of option classes in e.g. the CSV read/write functions, and that is not super ergonomic to use), but I am not entirely sure that the kwargs approach will scale nicely at the C++ level (although I assume we could have some helper function that processes the kwargs and creates the relevant s2geography/s2geometry option objects, and that can be reused everywhere)

  1. expose those options globally

Do we think it will be useful to allow setting options globally? For some probably yes (I assume setting a snapping method might be useful to be able to set globally, to avoid having to specify that in every function call). But for example for the output dimension of an operation (i.e. do you want to preserve only the polygonal parts, or only the lines, etc), that depends a lot on the function you call and the input data, and that might be better specified per function. Same for the model (where you might want a different default depending on the operation).

Now, we can of course start with only exposing those options globally that we think makes sense.


We can probably also start with just exposing a subset of the relevant options as we go to get some experience on what is useful/needed.

benbovy commented 21 hours ago

Yes I also generally prefer separate keyword arguments over option classes. I'm not too worried about how to manage a lot of repeated kwargs at the C++ level. Although it will require some work, I'm confident that some helper functions will do the job (maybe also for generating parts of the API docstrings).

Now, we can of course start with only exposing those options globally that we think makes sense.

We can probably also start with just exposing a subset of the relevant options as we go to get some experience on what is useful/needed.

Agreed.

paleolimbot commented 13 hours ago

expose those options globally

I can see how you might want to do something like:

with spherely.s2_options(snap_function=spherely.snap_level(20)):
    # ...a bunch of operations where you only care about a certain level of precision

in pyarrow we also make use of option classes in e.g. the CSV read/write functions, and that is not super ergonomic to use

If pyarrow had reasonable autocomplete, this would probably be fine (and a much better experience than kwargs, which won't give you autocomplete to let you know which options you can set if you don't remember if it was snap or snap_function or whatever). You could have your options object remember which options were set and error/warn for options that were set but are not relevant for the current operation).

I'm not too worried about how to manage a lot of repeated kwargs at the C++ level.

I wonder if you're hitting the limits of what you can accomplish without a Python wrapper layer?

benbovy commented 13 hours ago

Since **kwargs can be typed with Unpack[TypedDict] I wonder if autocompletion would work using a static-checker based lsp server...