expersso / BIS

Programmatic access to BIS data
19 stars 10 forks source link

get_datasets() proxy issue #4

Open ab2dridi opened 5 years ago

ab2dridi commented 5 years ago

hello, Like the same issue here (https://github.com/expersso/OECD/issues/11) can you modify the package to support corporate proxy using httr: the solution is to modify the get_datasets() function like below:

get_datasets <- function() { url <- complete_url("/statistics/full_data_sets.htm") page <- xml2::read_html(httr::GET(url)) nodes <- rvest::html_nodes(page, xpath = "//a[contains(@href, 'zip')]") dplyr::tibble(name = rvest::html_text(nodes), url = complete_url(rvest::html_attr(nodes, "href"))) }

expersso commented 5 years ago

I don't think adding httr as a dependency is the right course of action here. The package should work fine with a corporate proxy as long as you set your https_proxy environmental variable.

ab2dridi commented 5 years ago

Thank you for your response, it's not working for me using http_proxy and https_proxy environmental variable, i get 407 error, we are using NTLM auth,

datasets <- get_datasets() Error in open.connection(x, "rb") : Received HTTP code 407 from proxy after CONNECT

the only solution that is working for me is to usee httr::GET(url)

thank you :)

dbradnum commented 3 years ago

Hi,

Apologies for returning to a pretty old issue - but I've just discovered this after a colleague ran into the same problem. As it happens, I'm also the author of the linked issue above in the OECD package.

After testing, I also agree with @ab2dridi - get_datasets() doesn't work even when the proxy server address is configured with an environment variable: that isn't always enough to authenticate with the proxy. So I think the change he suggests, to use httr::GET(), would be very helpful. (Would you be open to a PR?)

(Digging into details a bit - the key thing seems to be that xml2::read_html(url) uses the curl package under the hood, and I don't know of any way to configure that with the proxy server's authentication mode (ie NTLM in our case). It doesn't appear that libcurl has an environment variable to set this, sadly - see here).