briatte / mdsr

More Data Science with R (2024)
https://f.briatte.org/
0 stars 0 forks source link
data-science r

> More Data Science with R

François Briatte
Spring 2024. Work VERY MUCH in progress.

A follow-up to an introduction to data science with R, RStudio, and the {tidyverse} packages, still aimed at social scientists. This course requires some prior training in introductory statistics and regression modelling.

N.B. -- the current repo does not include the full set of datasets used during the semester, which are all publicly available. Future versions will include the full data and slides.

Outline

  1. Software
  2. Revisions
  3. SQL databases
  4. Web scraping
  5. Linear models
  6. Panel data
  7. Survey data
  8. Feedback
  9. Multilevel data
  10. Machine learning in R
  11. Machine learning in Python
  12. Dashboards

Bonus sections:

1. Software

A session to get started again with R and RStudio, this time through R Markdown notebooks, which are dynamic documents that can combine text and images with code as well as plots and other kinds of results.

> Demo: LGBTI inclusivity in OECD countries

2. Revisions

A general-revisions session that covers data wrangling and visualization with various packages of the tidyverse bundle. Now is the right time to take a look at cheatsheets and similar material.

> Demo: U.S. life expectancy (code by Kieran Healy)

3. SQL databases

A session focused on advanced data wrangling. SQL databases, in particular, is what you will need when in need for speed and/or out-of-memory calculation on very (possibly very very) large data.

> Demo: Government cabinet composition (ParlGov data, code by Holger Döring)

4. Web scraping

Another session focused on advanced data wrangling. Web scraping is what you will need if your data are trapped online into Web pages.

> Demo: Locating nuclear reactors worldwide (data from the IAEA)

5. Linear models

Mostly revisions of what was covered in the introductory course.

> Demo: Worldwide fertility rates (QOG/World Bank data)

6. Panel data

> Demo: Worldwide fertility rates (QOG/World Bank data)

7. Survey data

> Demo: EU skepticism and migration (ESS data, code by Holger Döring)

8. Feedback

Feedback on your first drafts, and recommendations for the coming weeks.

9. Multilevel data

> Demo: EU skepticism and migration, continued (ESS data, code by Holger Döring)

10. Machine learning in R

> Demo: White Trump voters (CCES data, code by Steven Miller)

11. Machine learning in Python

> Demo: Trump tweets (Twitter data, code by Bernhard Rieder)

12. Dashboards

> Demo: Worldwide air pollution (World Bank data, code by Paul Moraga)


Dependencies

pkg_data <- c("countrycode", "rsdmx", "RSQLite", "sf", "tidyverse")
# ... also installs {DBI} and {rvest}, inter alia
pkg_models <- c("easystats", "lme4", "plm", "fixest", "tidymodels")
# ... installs a lot of essentials, such as {performance}
pkg_tables <- c("broom", "broom.mixed", "DT", "modelsummary", "texreg")
pkg_varia <- c("flexdashboard", "leaflet")

# install.packages("remotes")
for (i in c(pkg_data, pkg_models, pkg_tables, pkg_varia)) {
  remotes::install_cran(i)
}

Credits

The DSR README has a list of relevant credits.

Elsewhere

More to come.