>
More Data Science with RFrançois Briatte
Spring 2024. Work VERY MUCH in progress.
A follow-up to an introduction to data science with R, RStudio, and the {tidyverse}
packages, still aimed at social scientists. This course requires some prior training in introductory statistics and regression modelling.
N.B. -- the current repo does not include the full set of datasets used during the semester, which are all publicly available. Future versions will include the full data and slides.
Bonus sections:
A session to get started again with R and RStudio, this time through R Markdown notebooks, which are dynamic documents that can combine text and images with code as well as plots and other kinds of results.
>
Demo: LGBTI inclusivity in OECD countries
tidyverse
package bundleA general-revisions session that covers data wrangling and visualization with various packages of the tidyverse
bundle. Now is the right time to take a look at cheatsheets and similar material.
>
Demo: U.S. life expectancy (code by Kieran Healy)
dplyr
dbplyr
stringr
A session focused on advanced data wrangling. SQL databases, in particular, is what you will need when in need for speed and/or out-of-memory calculation on very (possibly very very) large data.
>
Demo: Government cabinet composition (ParlGov data, code by Holger Döring)
httr
rvest
and xml2
Another session focused on advanced data wrangling. Web scraping is what you will need if your data are trapped online into Web pages.
>
Demo: Locating nuclear reactors worldwide (data from the IAEA)
broom
performance
Mostly revisions of what was covered in the introductory course.
>
Demo: Worldwide fertility rates (QOG/World Bank data)
fixest
and plm
>
Demo: Worldwide fertility rates (QOG/World Bank data)
survey
and srvyr
>
Demo: EU skepticism and migration (ESS data, code by Holger Döring)
Feedback on your first drafts, and recommendations for the coming weeks.
lme4
>
Demo: EU skepticism and migration, continued (ESS data, code by Holger Döring)
tidymodels
package bundle>
Demo: White Trump voters (CCES data, code by Steven Miller)
scikit-learn
library>
Demo: Trump tweets (Twitter data, code by Bernhard Rieder)
flexdashboard
packagesf
and Leaflet>
Demo: Worldwide air pollution (World Bank data, code by Paul Moraga)
pkg_data <- c("countrycode", "rsdmx", "RSQLite", "sf", "tidyverse")
# ... also installs {DBI} and {rvest}, inter alia
pkg_models <- c("easystats", "lme4", "plm", "fixest", "tidymodels")
# ... installs a lot of essentials, such as {performance}
pkg_tables <- c("broom", "broom.mixed", "DT", "modelsummary", "texreg")
pkg_varia <- c("flexdashboard", "leaflet")
# install.packages("remotes")
for (i in c(pkg_data, pkg_models, pkg_tables, pkg_varia)) {
remotes::install_cran(i)
}
The DSR README has a list of relevant credits.
More to come.