Closed paulgronke closed 2 months ago
I'm not sure if the R library supports it but it should work on the backend just fine. Here's an example:
wget --content-disposition 'https://dataverse.unc.edu/api/access/datafile/7527436?format=RData'
Please see https://guides.dataverse.org/en/5.13/api/dataaccess.html#basic-file-access
On the site when trying to download that 7103004 data in Rdata format manually in a webbrowser I receive
I receive the same 404 with the curl command:`
require(httr)
params = list(
`format` = "RData"
)
res <- httr::GET(url = "https://dataverse.harvard.edu/api/access/datafile/7103004", query = params)
while the example of Philip works fine. Could be an issue with this specific publication? (the other formats download fine)
If you don't care about variable labels, this will do: SPAE22 <- get_dataframe_by_name( filename = "MITU0042_OUTPUT_0120.tab", dataset = "10.7910/DVN/SPU2XP", server = "dataverse.harvard.edu")
@Danny-dK huh, you're right, when I do either of these...
wget --content-disposition 'https://dataverse.harvard.edu/api/access/datafile/7103004?format=RData'
curl 'https://dataverse.harvard.edu/api/access/datafile/7103004?format=RData'
... I get 404 and {"status":"ERROR","code":404,"message":"datafile access error: requested optional service (image scaling, format conversion, etc.) could not be performed on this datafile."}
It's strange because when I go to https://dataverse.harvard.edu/file.xhtml?fileId=7103004 it offers RData as a download format:
Perhaps there's a problem with the file? @Danny-dK please feel free to email support@dataverse.harvard.edu if you'd like someone at Harvard Dataverse to investigate.
One more thing I should mention is that even offering RData as a file format is somewhat controversial these days. Some people think it's obsolete:
I asked this very question on a Slack workspace and got the answer. Downloading an RData file isn’t as simple as SPSS and Stata, but is feasible. It’s not documented in the package documentation, but is documented here in the GitHub development space.
Here is the snippet from the documentation. Note that you will need the numerical dataverse entry number for the file. For my own part, I simply went back to using the SPSS version since it was read into R just fine.
base::load()
but cannot be assigned to anget_dataframe_*
, write as a binary file:as_binary <- get_file_by_doi https://iqss.github.io/dataverse-client-r/reference/files.html( filedoi = "doi:10.70122/FK2/PPIAXE/5VPXKE", server = "demo.dataverse.org")
temp <- tempdir https://rdrr.io/r/base/tempfile.html() writeBin https://rdrr.io/r/base/readBin.html(as_binary, path(temp, "county.RData")) load https://rdrr.io/r/base/load.html(path(temp, "county.RData"))
load_object <- function(file) { tmp <- new.env https://rdrr.io/r/base/environment.html() load https://rdrr.io/r/base/load.html(file = file, envir = tmp) tmp[[ls https://rdrr.io/r/base/ls.html(tmp)[1]]] }
as_rda <- get_dataframe_by_id( file = 1939003, server = "demo.dataverse.org", .f = load_object, original = TRUE) }
https://iqss.github.io/dataverse-client-r/reference/get_dataframe.html#examples
Paul Gronke Professor, Reed College Director, Elections and Voting Information Center http://evic.reed.edu
General Inquiries: Michelle Shafer, @.***
On Sep 6, 2023, at 12:09 PM, Philip Durbin @.***> wrote:
@Danny-dK https://github.com/Danny-dK huh, you're right, when I do either of these...
wget --content-disposition 'https://dataverse.harvard.edu/api/access/datafile/7103004?format=RData'
curl 'https://dataverse.harvard.edu/api/access/datafile/7103004?format=RData'
... I get 404 and {"status":"ERROR","code":404,"message":"datafile access error: requested optional service (image scaling, format conversion, etc.) could not be performed on this datafile."}
It's strange because when I go to https://dataverse.harvard.edu/file.xhtml?fileId=7103004 it offers RData as a download format:
https://user-images.githubusercontent.com/21006/266113719-65cb8d55-f9d0-44ba-a00b-3e584f5608f6.png Perhaps there's a problem with the file? @Danny-dK https://github.com/Danny-dK please feel free to email @. @.> if you'd like someone at Harvard Dataverse to investigate.
One more thing I should mention is that even offering RData as a file format is somewhat controversial these days. Some people think it's obsolete:
IQSS/dataverse#6678 https://github.com/IQSS/dataverse/issues/6678 IQSS/dataverse#7249 https://github.com/IQSS/dataverse/issues/7249 — Reply to this email directly, view it on GitHub https://github.com/IQSS/dataverse-client-r/issues/127#issuecomment-1708941231, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGBOF6G2KG326EBMN6DWCQDXZDC7TANCNFSM6AAAAAA3AQ3NDI. You are receiving this because you authored the thread.
Don't think this solves it though. That help documentation is somewhat incomplete or incorrect.
Aside from the 5VPXKE
file not being found using the function (receiving a file information not found on Dataverse API), but I can find the name and type of file in version 2 of https://demo.dataverse.org/file.xhtml?fileId=1939003&version=3.0. That file is nlsw88_rda-export.rda
and specifically is an rda file and thus already an R based file. The original question is why can't the tab file not be downloaded in RData format (more on that below). The help file also specifies
writeBin(as_binary, path(temp, ""county.RData""))
load(path(temp, "county.RData"))
but there is no path()
function (assuming this should be file.path()
) and the rda in question does not have a name County
(?, at least I'm not seeing that). Considering this is already a R formatted file, this simply works (no need for any other writeBin stuff; X2FC5V is the same file but in version 3 of that demo publication):
get_dataframe_by_doi(
filedoi = "10.70122/FK2/PPIAXE/X2FC5V",
server = "demo.dataverse.org",
original = TRUE,
.f = function(x) load(x, envir = .GlobalEnv))
The original question was why a tab file can be donwloaded as a format from dataverse website, but not through the R functions. The https://doi.org/10.7910/DVN/SPU2XP MITU0042_OUTPUT_0120.tab file cannot be downloaded from the website due to the previous 404 error message. I found others as well with the same error message (for example https://doi.org/10.7910/DVN/ONZOPT gets the same error when trying to download as a RData format from the website). This https://doi.org/10.7910/DVN/NKN0E8/Y2HP2J Data Set I.tab
however downloads fine in RData format from the website and can be loaded into R. But trying this using the R dataverse code does not work and receives the error:
Error in load(x, envir = .GlobalEnv) :
bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning messages:
1: In readChar(con, 5L, useBytes = TRUE) :
truncating string with embedded nuls
2: file ‘foo498c108a300’ has magic number 's'
Use of save versions prior to 2 is deprecated
A quick google on the last error message shows that R > 3.5.0 RData files are saved in version 3, any below are saved in version 2 and are not compatible to be loaded. (examples https://stackoverflow.com/questions/12463583/the-cause-of-bad-magic-number-error-when-loading-a-workspace-and-how-to-avoid and https://stackoverflow.com/questions/57242296/workspace-cannot-be-loaded-in-server-file-has-magic-number-rdx3)
I'll contact support@dataverse to see whether they find anything odd with the offering of RData files (perhaps they are using older versions to offer RData files) and why some show the 404 error .
Ah, this makes sense:
https://github.com/IQSS/dataverse/issues/9490#issuecomment-1492640510
Unfortunately, this download-as-RData support, that uses a remote R instance via Rserve, is just flaky and unreliable. The whole subsystem is rather obsolete by now, and we are seriously considering retiring it. There's some lively debate (including in the Dataverse users group right now) about whether this "download-as-RData" functionality is actually providing any useful value. (If the ingested original was a Stata file, any R user can easily download the .dta file and import it into R - which has excellent support for Stata via the package "foreign"; if the original was RData... the whole point is moot; etc. etc.) Originally posted by @landreev in https://github.com/IQSS/dataverse/issues/9490#issuecomment-1492640510
https://github.com/IQSS/dataverse/issues/8711#issue-1239788342
The option to download tabular data in RData format should not appear in the dropdown menu if a Dataverse installation has not been configured to handle RData
Aside from updating the help documentation to load a published rda file, this issue here could pretty much be closed I guess.
@Danny-dK that method load(x, envir = .GlobalEnv)
is better -- thanks. It is in dev now (#107). And yes, path
should have been file.path
or fs::path
.
As for @paulgronke's original dataset, as a R user I don't see the advantage of loading a SPSS file like MITU0042_OUTPUT_0120.sav as a RData object rather than a sav file or ingested plain-text file. Paul's first example with haven::read_sav seems superior in all respects.
dataverse aside, I don't see how a binary/sav/text file can be loaded as a rda file without first relying on sav/text. So I think it's fine that Danny's example with MITU0042_OUTPUT_0120 are not working. I guess Rserve does some transformations that makes it happen, but I don't know that system.
Yes, Dataverse uses RServe to create an RData file out of the tab-separated version.
@kuriwaki Thanks!
Indeed, I don't particularly see the need for conversion to RData / rda through dataverse. R and various libraries are perfectly capable reading in various formats itself. I agree with the discussions on the aforementioned git issue links. Thanks for the work!
Please specify whether your issue is about:
In the download vignette, there is a section titled "Retrieving Custom Data Formats (RDS, Stata, SPSS)" that works as described.
But there is no description of how to download a file in RData format. Is this possible?
The code successfully downloads the .sav file but cannot figure out how to load an RData file, so as to avoid the extra step of using haven.