IQSS / dataverse-client-r

R Client for Dataverse Repositories
https://iqss.github.io/dataverse-client-r
60 stars 24 forks source link

Improve doc on how to read objects without object assignment #107

Closed kuriwaki closed 2 months ago

kuriwaki commented 2 years ago

RData files cannot be read in as an object, but instead are simply released on to the user environment. I think we should all be switching to Rds (see https://github.com/IQSS/dataverse/issues/7249) but nonetheless, some files on Dataverse are uploaded as .RData.

It turns out there are two ways to load this. One is the old way to write the binary file and re-read it with a different function. Another is to create a mini environment within a function, as I found on Stack Overflow. See both in the reprex below. I get identical objects.

We should update the doc with an example.

h/t @jonrobinson2

library(dataverse)
library(fs)

# Algara dataset
# https://dataverse.harvard.edu/file.xhtml?fileId=5028532&version=1.0

# 1. writing and saving as binary works
as_binary <- get_file_by_id(file = 5028532, server = "dataverse.harvard.edu")

temp <- tempdir()
writeBin(as_binary, path(temp, "county.RData"))
load(path(temp, "county.RData"))

str(pres_elections_release)
#> 'data.frame':    113756 obs. of  20 variables:
#>  $ election_year                        : num  1868 1872 1876 1880 1884 ...
#>  $ fips                                 : chr  "01001" "01001" "01001" "01001" ...
#>  $ county_name                          : chr  "AUTAUGA" "AUTAUGA" "AUTAUGA" "AUTAUGA" ...
#>  $ state                                : chr  "AL" "AL" "AL" "AL" ...
#>  $ sfips                                : chr  "01" "01" "01" "01" ...
#>  $ office                               : chr  "PRES" "PRES" "PRES" "PRES" ...
#>  $ election_type                        : chr  "G" "G" "G" "G" ...
#>  $ seat_status                          : chr  "Open Seat" "Republican President Re-election" "Open Seat" "Open Seat" ...
#>  $ democratic_raw_votes                 : num  851 669 804 978 911 ...
#>  $ dem_nominee                          : chr  "Horatio Seymour" "Horace Greeley" "Samuel J. Tilden" "Winfield Scott Hancock" ...
#>  $ republican_raw_votes                 : num  1505 1593 1576 974 877 ...
#>  $ rep_nominee                          : chr  "Ulysses S. Grant" "Ulysses S. Grant" "Rutherford B. Hayes" "James A. Garfield" ...
#>  $ pres_raw_county_vote_totals_two_party: num  2356 2262 2380 1952 1788 ...
#>  $ raw_county_vote_totals               : num  2356 2262 2380 1967 1789 ...
#>  $ county_first_date                    : Date, format: "1818-11-21" "1818-11-21" ...
#>  $ county_end_date                      : Date, format: NA NA ...
#>  $ state_admission_date                 : chr  "1819-12-14" "1819-12-14" "1819-12-14" "1819-12-14" ...
#>  $ complete_county_cases                : num  1 1 1 1 1 1 1 1 1 1 ...
#>  $ original_county_name                 : chr  NA NA NA NA ...
#>  $ original_name_end_date               : Date, format: NA NA ...

# 2. how about directly into R? This is a Rdata file, which we often read by load().

# via: https://stackoverflow.com/questions/34925668/r-assign-content-from-rda-object-with-load
load_object <- function(file) {
  tmp <- new.env()
  load(file = file, envir = tmp)
  tmp[[ls(tmp)[1]]]
}

as_rda <- get_dataframe_by_id(file = 5028532, 
                              server = "dataverse.harvard.edu", 
                              .f = load_object, 
                              original = TRUE)

identical(as_rda, pres_elections_release)
#> [1] TRUE

Created on 2021-09-16 by the reprex package (v2.0.1)

kuriwaki commented 10 months ago

@Danny-dK's proposal is more concise:

get_dataframe_by_doi(
  filedoi = "10.70122/FK2/PPIAXE/X2FC5V",
  server = "demo.dataverse.org",
  original = TRUE,
  .f = function(x) load(x, envir = .GlobalEnv))

I have made this change in dev: f33e578217547f5f465bdb5f50d2b347df4fa18a

kuriwaki commented 2 months ago

Implemented in 0.3.14