IQSS / dataverse-client-r

R Client for Dataverse Repositories
https://iqss.github.io/dataverse-client-r
61 stars 25 forks source link

Reading Stata suggestion - change foreign to haven? #34

Closed kuriwaki closed 2 years ago

kuriwaki commented 4 years ago

In the current man pages and vignette, the usage of dta files suggest foreign::read.dta. I would propose switching to haven::read_dta or at least seeing if all the tests would go through with haven. haven is a tidyverse-based package that has surpassed foreign in recent years (see below). More importantly, haven can read all Stata dataset versions, whereas foreign is stuck in v12 (Stata is currently at v16).

library(ggplot2)
library(dplyr)
library(dlstats)

dl_stats <- cran_stats(c("haven", "foreign"))

dl_stats %>% 
  as_tibble() %>% 
  group_by(package) %>% 
  slice(-n()) %>% 
  rename(Package = package) %>% 
  ggplot(aes(end, downloads, group = Package, color = Package)) +
  geom_line(aes(linetype = Package)) + geom_point() +
  labs(y = "CRAN downloads",
       x = "")

Created on 2019-12-09 by the reprex package (v0.3.0)

wibeasley commented 4 years ago

TLDR: yes --I like it and think we should update the vignette & examples.

Thanks @kuriwaki for the thoughtful argument. I don't know much about Stata. Is it backwards compatible, in the sense that newer software versions have trouble reading older datasets? If so, your last point is very convincing, and overtakes everything. Since version 13 was released 6.5 years ago, that potentially leaves out a lot of Dataverse files.

More importantly, haven can read all Stata dataset versions, whereas foreign is stuck in v12 (Stata is currently at v16).

Even though I'd use haven without hesitation in my own projects, when deploying packages I'd lean towards packages developed and maintained by the R Core Team itself. But it sounds like it's not being maintained, at least regarding Stata versions?

wibeasley commented 4 years ago

@kuriwaki, I had another thought that supports your proposal to switch to haven. The examples & vignettes are more than just operational code. They should provide a good spring board for new Dataverse users. Ideally, they should easily transition from our pre-set ideas in the vignette to something specific to their scenario. I'm guessing haven is a better platform than foreign (a) to add wrinkles to a Stata-based project and (b) to adapt the vignette/example to Sas/Spss/csv/whatever.

kuriwaki commented 4 years ago

Yes, Stata is backwards compatible regarding its datasets.

foreign's help page says the read.dta function is "Frozen" at v12, so it looks unlikely that will change. dataverse already will ingest Stata datasets as new as v15, so I suspect foreign will have trouble there (#33).

There are a some differences between the outputs of foreign::read.dta and haven::read_dta, for example labelled/encoded variables are read as factors in the former and as lablled integers in the latter. Not sure if that's relevant to this package though.

pdurbin commented 4 years ago

dataverse already will ingest Stata datasets as new as v15

@kuriwaki thanks for opening https://github.com/IQSS/dataverse/issues/6444 about fixing the Dataverse docs about which version of Stata is supported! 🎉