briatte / srqm

An introductory statistics course for social scientists, using Stata
https://f.briatte.org/teaching/quanti/
49 stars 17 forks source link

Dataset updates #30

Open briatte opened 3 years ago

briatte commented 3 years ago

Closes #21, #22 and #23 (copied below), #27.

Update from 2023

Stop updating the data, really.

Detailed notes

Note on QOG -- offers only this as a replacement in 2023, which is not ideal:

// school life expectancy
sc wdi_fertility wef_lse, ms(i) mlab(ccodealp) || lfit wdi_fertility wef_lse, ///
    name(g1, replace)
// linear fit + SSA data points only, underpredicted
sc wdi_fertility wef_lse if ht_region == 4, ms(i) mlab(ccodealp) || ///
    lfit wdi_fertility wef_lse, ///
    name(g2, replace)
// all regions
forv i = 1/10 {
    sc wdi_fertility wef_lse if ht_region == `i', ms(i) mlab(ccodealp) || ///
    lfit wdi_fertility wef_lse, ///
    name("region`i'", replace)
}

The plan for 2021:

Additional things to consider:

Dataset names

I like the initial "acronym + year" convention, but it produces strange names for multiple-year survey datasets:

Merged datasets

Is it still a good idea to do that for e.g. ESS? Probably not, esp. if we need to limit datasets at 2,048 variables for Stata/IC.

Both WVS and ESS are used to demo keep if inlist(country, …), the other subset we want to show.

Additional datasets

It would make a lot of sense to have more datasets for the students to use than those used in the do-files.

Currently, the do-files are selective anyway: we provide ESS 2016 (Round 8) but do not use the data, even though the dependent variable also exists in that round.