bcgov / bcdata

An R package for searching & retrieving data from the B.C. Data Catalogue
https://bcgov.github.io/bcdata
Apache License 2.0
82 stars 13 forks source link

[JOSS Review] UF5: Improve documentation for conversion to bounding box #243

Closed ateucher closed 3 years ago

ateucher commented 3 years ago

UF5: The documentation for object conversion to a bounding box when using the spatial predicates could be improved.

The message the user sees:

The object is too large to perform exact spatial operations using bcdata. To simplify the polygon, a bounding box was drawn around the polygon and all features within the box will be returned. Options include further processing with on the returned object or simplify the object.

does not say what size threshold was used, and how much it was exceeded by. If I would prefer to take the simplification option, assuming it is a genuine option, it would be nice to know what target I am aiming for and how close I am to it.

The first place I would look to demystify this would be the help file for the spatial predicates (cql_geom_predicates), but I think the help is underdone with respect to this functionality. It should probably mention the internal option bcdata.max_geom_pred_size and how the size is calculated. Alternately it could refer users to the efficiently-query-spatial... vignette for this information.

On that point, I noticed in the source that the size is calculated on the entire spatial object, rather than just the geometry as I would have expected. See: https://github.com/bcgov/bcdata/blob/master/R/cql-geom-predicates.R#L114.

My recommendation would be to change this so that the size is calculated just on the data that will be translated to text via your sf::st_as_text call. This will make the code a bit more cumbersome, but s3 methods for size calculation might be used to mitigate that. With this change the sizing of objects is much more predictable, making it a simpler task to bring data below the threshold.

Users could then be told it's a case of utils::object.size(st_geometry(obj)) <= getOption("bcdata.max_geom_pred_size") or a bounding box will be used.

If you wanted to be really kind you could also expose a function that helps the user determine if a spatial object will be converted to a bounding box in their predicates.

If you don't take my recommendation, then I think the size calculation is surprising enough that it needs to be well documented.

A summary of UF5 suggestions:

Originally posted by @MilesMcBain in https://github.com/bcgov/bcdata/issues/239#issuecomment-762230962