R-ArcGIS / arcgisutils

http://r.esri.com/arcgisutils/
Apache License 2.0
15 stars 2 forks source link

`rbind_results()` helper function #42

Closed JosiahParry closed 8 months ago

JosiahParry commented 8 months ago

This PR adds a new function that takes a list of data.frames or sf objects and combines them as efficiently as possible.

Closes #38 Related https://github.com/R-ArcGIS/arcgislayers/pull/167

This helps us with the pattern described in #38. There are two points I want to make. One of the annoying bits of the Esri REST API endpoints is that errors are not returned as a status code, but rather, the error is captured in the json body.

Avoiding httr2::resps_data()

httr2::resps_data() is a useful function, however, it uses vctrs::list_unchop() which is only marginally faster than do.call(rbind.data.frame, list()) whereas collapse::rowbind() is orders of magnitude faster. The approach introduced in #42 will use collapse, data.table, vctrs, then base R to ensure the fastest approach is used. These are incompatible with httr2::resps_data().

Errors are not captured 400 codes

One challenging part of the ArcGIS location service endpoints is that errors are not returned as a 400 code. Instead, they are a json response with the error as plain text json.

The approach that I think might be most streamlined is this:

  1. for each response process the results, if an error is encountered return NULL
  2. Combine the results (a la #42)
  3. Use the NULL responses to add error information to the response as an attribute. How do we do this?

This approach can be wrapped in another utility function similar to resps_data() something like arc_resps_data() which would have a signature like so:

arc_resps_data <- function(resps, .f)  {
  rbind_results(lapply(resps, .f))
}

The attribute null_elements returns the indices of the errors. Then each function can either return the requests / response or...something?

@elipousson, would you mind reviewing if you have time?

elipousson commented 8 months ago

Taking a quick look now, looks good to me. Moving this into {arcgisutils} should definitely reduce the complexity of arc_select()! A couple of notes:

I think the inherits_or_null call should handle this but it did take some figuring out how to make sure this worked with tables, feature layers, and layers when returnGeometry was set to FALSE. There are also endpoints that return KML, XML, or other data types which this function would presumably not be able to handle.

JosiahParry commented 8 months ago

Thank you very much for your feedback! I've modified the check_data_frame() call so that the call is passed along with it as well as the arg value. There is a new test for this as well.

I've fixed the documentation to reference vctrs::list_unchop() and I've added an unused argument .ptype to rbind_results() that is reserved for a future time.