Parse 'unfriendly' link headers

stillmatic commented 7 years ago

see discussion in #4 - this PR is necessary to support queries which do not have a properly formed 'last' rel in the link header.

Broadly, we understand the link structure for a given Canvas API call. All that we need to figure out is the number of pages. Then, we do one of two things: if the 'last' rel is given, we use that page number; otherwise, we check the header of each page, incrementing n_pages until we no longer have a next page. Using this n_pages, we construct the vector of pages to use.

Since this logic is common to multiple query types (i.e. not just submissions), I've refactored the other queries to use this same logic, contained in process_response.R. I've also made the canvas_query function more generic, accepting different request types which httr supports (I think you had something similar changed locally).

daranzolin commented 7 years ago

Okay cool, I like what you've done here, but parse_response() has triggered some errors on initial testing on my end. It looks like httr::headers(x)$link returns a NULL value for certain functions, and that's stopping things further down the line.

I'll have more time to get into it later this weekend.

stillmatic commented 7 years ago

Will take a closer look after NYE, I think the fix is along the lines of using n_pages = 1 if link is NULL, so long as there actually is a response (maybe if it returns status code 200).

daranzolin commented 7 years ago

I hear you, happy new year! rcanvas will be huge in 2017, let's make it happen! Thanks for all your help this year.

stillmatic commented 7 years ago

Happy new years to you as well!

The bug is what I suspected, and I can replicate with something like users <- get_course_items(12345, item = "students"). Should be fixed here!

daranzolin commented 7 years ago

Hmmm increment_pages() isn't returning the next page on my end. I added print statements to the n_pages and page_temp objects within get_pages(), and here was part of the looping while output:

screen shot 2017-01-01 at 8 24 46 pm

We would just need to modify base_url and maybe update the regex, although I wonder if it would be better to just grab the bookmarked next page rather than build up the next page by incrementing. get_next_page() might look like this:

get_next_page <- function(resp) {
  url_pattern <- "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
  next_page <- stringr::str_split(httr::headers(resp)$link, ",")[[1]]
  next_page <- stringr::str_subset(next_page, "rel=\"next\"")
  next_page <- stringr::str_extract(next_page, url_pattern)
  next_page <- stringr::str_sub(next_page, end = -3L)
  return(next_page)
}

stillmatic commented 7 years ago

What is the proper output in that case? Like, if you have a url with 'page=first', what are the other pages supposed to be? Would pages 1:6 work there?

I can't view pageviews on my end so I can't test this.

daranzolin commented 7 years ago

I just pushed some new changes, building on some of your work. I added the paginate() function, which stores a list of responses from subsequent GET requests. It seemed a little easier, and it's working okay on my end, but let me know what you think.

stillmatic commented 7 years ago

I think it looks fine, I tested a few things and it was working. I agree that taking the next header is probably a safer method.

Only a couple quibbles in process_response.R:

L58, L67, L77 should probably use pages instead of httr::headers(x)$link
L76: is there any reason to use GET instead of HEAD here, since we only want the headers?

daranzolin commented 7 years ago

I agree. Latest commit fixes both.

daranzolin / rcanvas

Parse 'unfriendly' link headers #5