Closed JBGruber closed 6 months ago
Thanks. What about making a new set of functions get_url_by_*
instead of adding an option? Putting it in get_file seems to muddy the function a bit, as in getdataframe*
@kuriwaki Maybe a dumb suggestion, but..... If the point of the url return is to create a second function to download a file from dataverse, why not build it in as a function of existing dataverse package? You could create a separate function like download_file_by_url()
But I could also imagine just build it in the current functions where you have an option something like download = TRUE|FALSE
and download_path = 'path_to_file'
instead of return_url=TRUE|FALSE
. Then at the get_file_by_id
function at the end instead:
u <- paste0(api_url(server), u_part, fileid)
if (isFALSE(download)){
r <- httr::GET(u, httr::add_headers(`X-Dataverse-key` = key),
query = query, httr::progress(type = "down"), ...)
httr::stop_for_status(r, task = httr::content(r)$message)
httr::content(r, as = "raw")
}
if (isTRUE(download)){
httr::GET(u, httr::add_headers(`X-Dataverse-key` = key),
query = query, httr::progress(type = "down"), httr::write_disk(download_path, overwrite = TRUE), ...)
}
(note that I removed the if
for progress
that is currently in that part of the function as I think it is always useful to see progress of what is happening, even if it may take some overhead)
I did not test whether the query
and add_headers
parts worked in using write_disk
, but I assume it would.
Ignore above if not purpose of the suggestion.
I was thinking something similar to the first option of @Danny-dK -- a function separate from the get_dataframe_*
family. Unless others think otherwise, I will try to implement this standalone return_url function in the next CRAN fix.
@JBGruber @Danny-dK I've opted to make a
get_url_*()
function that simply wraps around your infrastructure and returns a URL. That is, get_file(return_url = TRUE)
still returns a URL, but I thought making a function get_url_
is easier to remember.download_*
yetHere is the help page as a reprex
library(dataverse)
# get URLs
get_url_by_name(
filename = "nlsw88.tab",
dataset = "10.70122/FK2/PPIAXE",
server = "demo.dataverse.org"
)
#> [1] "https://demo.dataverse.org/api/access/datafile/1734017?format=original"
# https://demo.dataverse.org/api/access/datafile/1734017?format=original
# For ingested, tab-delimited files
get_url_by_name(
filename = "nlsw88.tab",
dataset = "10.70122/FK2/PPIAXE",
original = FALSE,
server = "demo.dataverse.org"
)
#> [1] "https://demo.dataverse.org/api/access/datafile/1734017"
# https://demo.dataverse.org/api/access/datafile/1734017
# To download to local directory
curl::curl_download(
"https://demo.dataverse.org/api/access/datafile/1734017?format=original",
destfile = "nlsw88.dta")
Created on 2024-05-12 with reprex v2.0.2
Updated the tests and edited the previous post above to get the correct reprex
Hey, thanks for carrying this over the finish line, @kuriwaki, and sorry for being unresponsive to the requests to change things. I had to use this again just now and it worked perfectly!
@JBGruber great to hear.
Please ensure the following before submitting a PR:
/R
not/man
and rundevtools::document()
to update documentation/tests
for any new functionality or bug fixR CMD check
runs without error before submitting the PRdescription
As noted in #128, I believe that it makes sense to have the option to make the
get_file_by_*
functions return a URL so larger files can be downloaded using other packages or software. Here is a quick demo:Created on 2023-09-14 with reprex v2.0.2
I then like to use:
since it is fast and reliable. But it's up to you :grin:
input needed
What is left to do is to decide what happens to the
get_dataframe_by_*
functions. I would suggest that they should error when thereturn_url
parameter is returned, since this option makes little sense, I believe.