chircollab / chircollab20

Home for the 2020 Chicago R Collab event
https://chircollab.github.io/
10 stars 3 forks source link

Replacing uses of single-request {httr} functions with RETRY() #1

Open jameslamb opened 4 years ago

jameslamb commented 4 years ago

TL;DR

There are a ton HTTP clients on CRAN, and many of them make single pass-or-fail HTTP requests. If they added some retry logic (with httr::RETRY()), they would be more resilient to transient issues like brief network disruptions or service downtime.

Details

{httr} is a super-popular package for making HTTP requests in R and handling their response. It has a LOT of direct reverse imports, and impacts a lot of other projects indirectly.

image

I have seen many examples of packages where people use functions like httr::GET(), httr::POST(), httr::VERB(), etc. to make HTTP requests. These functions attempt to make a single request and raise an error if anything goes wrong.

I believe (though I don't know for sure) that this all-or-nothing approach is not something package authors have carefully and intentionally chosen, and changing those calls to httr::RETRY() would make those packages more resilient to transient problems like brief network outages or periods where the service(s) it hits are overwhelmed. In my experience, using retry logic can improve the user experience with HTTP clients.

I propose a group spend some time improving the ecosystem of HTTP clients in R by repeating these steps:

  1. choose a package from the {httr} reverse imports
  2. go to the package's source code on GitHub (if it exists)
  3. find all references to httr::GET(), httr::POST(), httr::PUT(), httr::VERB(), httr::DELETE(), httr::HEAD(), and httr::PATCH() and replace them with the appropriate httr::RETRY()
  4. create a pull request with the change, including some text in the summary explaining to the package author how this change will make their code more reliable (see https://github.com/ropensci/eia/pull/5 for an example)

I'd be happy to start out with a short introduction to HTTP and what httr::RETRY() actually does.

emilyriederer commented 4 years ago

Thanks for posting, @jameslamb ! I love this idea of taking the time to shore up bits and pieces of many different packages and spreading best practices.

Another good outcome might be seeing if RStudio would take a PR to {httr} 's 'Best Practices for API packages' vignette to share this advice more broadly

jdblischak commented 4 years ago

My workflowr package is one of those reverse dependencies, and I super welcome a PR to make the GitHub API calls more robust.

The function create_gh_repo() makes 3 GET requests and a POST request.

jameslamb commented 4 years ago

I wrote up a 101-level "wtf is HTTP" last night: https://github.com/jameslamb/talks/tree/master/chi-r-collab-httr. Decided to put it in that repo because I might turn it into a lightning talk in the future.

For those who want to do the "go make PRs to existing packages" part of this project, here are some examples:

The basic process is:

  1. Go to httr's CRAN page
  2. Look at the packages in Reverse Depends and Reverse Imports. Choose one that sounds interesting and click on it.
  3. Hopefully the package has a URL: section with a GitHub link. Click it.
  4. From that repository, click fork to fork it. (GitHub docs)
  5. Clone the repository to your machine.(GitHub docs)
  6. Look for uses of httr single-shot functions.
    • These are:
      • DELETE()
      • GET()
      • HEAD()
      • POST()
      • PATCH()
      • PUT()
      • VERB()
    • You can find these with commands like git grep 'POST(' or using Find in Files in RStudio
  7. Replace those calls with httr::RETRY(). See the pull requests linked above for some examples.
  8. Commit your changes, push them to your fork, and create a pull request (GitHub docs)
    • When you open the PR, be sure to add a link to this issue in the description. That way GitHub will do the hard work of tracking for us, like this!

image

jameslamb commented 4 years ago

Some resources shared during our first chat: