IQSS / dataverse-client-r

R Client for Dataverse Repositories
https://iqss.github.io/dataverse-client-r
61 stars 25 forks source link

Refactor to an internal "kernel" function? #51

Open wibeasley opened 4 years ago

wibeasley commented 4 years ago

@adam3smith wrote in https://github.com/IQSS/dataverse-client-r/issues/49#issuecomment-579343318

A different approach would be to make it easier to construct native API queries from scratch using the client so that we don't have to code everything but it's more readily accessible. I don't know how hard that would be, but when I wrote API functionality (for downloading .zip files) locally it required quite a bit of code duplication from dataverse library.

We encountered something similar in REDCapR. It's a similar package as this one --in the sense that it's basically a glove around curl calls to a server's api. Then it adds to convenience functions and validation to make R development easier because it returns R's native data.frame objects.

When we saw all the duplication in REDCapR, we refactored that core functionality into kernel_api(). That central location makes it easier to improve things consistently like character encoding and error handling. It returns raw result to the calling function, which decides whether to save it as a local file, or return a data.frame.

In dataverse::get_file(), I see the replication @adam3smith's talking about, with three instances similar to

u <- paste0(api_url(server), "access/datafile/", file, "/metadata/", format)
r <- httr::GET(u, httr::add_headers("X-Dataverse-key" = key), ...)
httr::stop_for_status(r)

This could become its own function like

kernel <- function(command, server, key, ...) {
  u <- paste0(api_url(server), command)
  r <- httr::GET(u, httr::add_headers("X-Dataverse-key" = key), ...)
  httr::stop_for_status(r)
  r
}

The code inside [dataverse::get_file_metadata()]() would look like

command <- paste0("access/datafile/", file, "/metadata/", format)
kernel(command, server, key)

The code inside the current dataverse::get_file() would have three different calls:

command <- paste0("access/datafiles/", file)
kernel(command, server, key)
...
command <- paste0("access/datafile/bundle/", file)
kernel(command, server, key)
...
command <- paste0("access/datafile/", file)
kernel(command, server, key)
...

@adam3smith, @kuriwaki, or anyone else, please share your impressions if you'd like. It looks like @skasberger and the pyDataverse do something very similar with their get_request()

adam3smith commented 4 years ago

I like the idea a lot -- it would both simplify coding in the package but also make it much easier to use functionality unsupported by the client (my linked use case).

Two comments on the above:

  1. I'm not sure about kernel -- not sure that's a very intuitive label in this case. How about mirroring pyDataverse and doing get_request() and post_request()
  2. The last bit is my second part. Having this separately available for POST and GET would be very desirable imo
kuriwaki commented 3 years ago

Anything that functionalizes all that paste duplication would be great for me! I agree that kernel is an uninformative name for my background. What @adam3smith suggests sounds better, or at least I'd like to know the definition of kernel in this context.