USAID-OHA-SI / grabr

OHA/SI APIs package
https://usaid-oha-si.github.io/grabr/
Other
1 stars 2 forks source link

"Page Time Out" error running `pano_session` #35

Closed achafetz closed 11 months ago

achafetz commented 1 year ago

In coRps today, @jess-stephens was running pano_session and getting an error.

grabr::pano_session()
#> Error in login_sess$status: $ operator is invalid for atomic vectors

Created on 2023-11-29 with reprex v2.0.2

@karishmas26 unpacked this, running through the steps and found the issue was arising from the session info. Note: this is not a credentials issue.

https://github.com/USAID-OHA-SI/grabr/blob/ee3d342072c5ff30cd48c8cfeac5835860ed7a2f/R/extract_pano.R#L36-L43

When running ln36-7, the login_sess value end up being "Page Timed Out" , which is not what the script is then expecting in the if statement in ln39. As a result, you get the error message above.

The larger issue is the page timing out. I was able to run this function on Monday (on AIDNET) to download the new MSDs, but it is resulting in this error for all of us (JS, KS, and AC) today.

The secondary issue is we need to add in a better return error message back to user to provide the error from the site, not the error in the script.

achafetz commented 1 year ago

I tested on off AIDNET (joining the guest network) and still got the same page timed out error.

baboyma commented 1 year ago

Looking into this. I think there have been some changes in the authentication flow.

baboyma commented 1 year ago

Ok, found a solution:

  1. Request a nonce token from the server (so it remembers what client will be loging in)
  2. Send a post request for a session with the token from #1

Note: this means all pano_* function will need to use a progenerated token (instead of the regular user/pass). Also need to account for token expiration.

achafetz commented 12 months ago

I tested out this AM.

#install dev branch
remotes::install_github("USAID-OHA-SI/grabr", ref = "develop")

#load
library(grabr)

url <- "https://pepfar-panorama.org/forms/downloads/"

#test 1 - can successfully create a session? --> SUCESSS
sess <- pano_session()

#test 2 - can successfully see items? --> FAIL
pano_items(url, sess)

#test 3 - can successfully extract most recent period folder? --> FAIL
url %>%
 pano_content(session = sess) %>%
 pano_elements() %>%
 dplyr::filter(stringr::str_detect(item, "^MER")) %>%
 dplyr::pull(item)

#reinstall prod version
rstudioapi::restartSession()
pak::pak("USAID-OHA-SI/grabr")

So while the session is being created, it doesn't seem that the credentials are being pass successully or something else is off. This is the error I get for tests 2 and 3.

Error in httr::content("text") : is.response(x) is not TRUE

The issue appears to be arising from here where session info is being pass into via cookies. https://github.com/USAID-OHA-SI/grabr/blob/aa53a63297779013d0a6a58bc9aa457247150bcc/R/extract_pano.R#L84-L86

baboyma commented 12 months ago

I think the issue is within the pano_content()

One the of validation in the if statement is missing x = page:

https://github.com/USAID-OHA-SI/grabr/blob/aa53a63297779013d0a6a58bc9aa457247150bcc/R/extract_pano.R#L88C41-L88C41

baboyma commented 12 months ago
pano_content <- function(page_url, session) {

  page <- httr::GET(page_url, httr::set_cookies("formsSessionState" = session))

  if (!base::is.null(page) & !is.null(httr::content(x = page, "text"))) {
    page <- page %>%
      httr::content("text") %>%
      rvest::read_html()
  } else {
    base::stop("ERROR - Unable to extract page content")
  }

  return(page)
}