bquistorff / synth_runner

A tool to run a pool of synthetic controls, conduct inference, and produce visualizations.
43 stars 27 forks source link

Feature request: trends in different years #33

Open mikewinddale opened 5 years ago

mikewinddale commented 5 years ago

This is a feature request, not an issue, but I had trouble issuing a pull request. My request is to allow the trends option to take a parameter which specifies which year to use for normalization - rather than the default, which is the last pre-treatment year.

I have manually performed this in my do-file. Example code follows. My outcome is cite_smooth_0. Everything else should be self-explanatory.

capture program drop normalized_regression
program define normalized_regression

args year

* cite_smooth_0_in_`year' is equal to cite_smooth_0[X], where X is a row number
* Within a bysort Name (Year), row number X is determined by counting from row 1 in 1878 until 
* we get to `year'. For example 1881 is row 4. More generally, the first year is Year[1] 
* rather than 1878. Therefore, (`year' - Year[1] + 1) is the row number of `year', given
* that row 1 is Year[1]. Thus, cite_smooth_0_in_`year' = cite_smooth_0[`year' - Year[1] + 1]
bysort Name (Year): gen cite_smooth_0_in_`year' = cite_smooth_0[`year' - Year[1] + 1]

* We're going to be normalizing by dividing cite_smooth_0 by the value of cite_smooth_0_in_`year'
* The problem is that a few observations (approximately 10 obs) have 
* values of cite_smooth_0_in_`year' of exactly 0 (zero), so they create division by zero errors. 
* To solve this problem, I'm going to replace zeroes with values equal to the smallest 
* non-zero value (in any year) for that particular author.
*
* We have to perform two steps to find the minimum non-zero value of min_cite_smooth_0
* First, "bysort Name: egen min_cite_smooth_0 = min(cite_smooth_0) if cite_smooth_0 > 0" 
* finds the minimum non-zero value, but only for the observations which have a non-zero value.
* That is, for the observations for which cite_smooth_0 == 0, min_cite_smooth_0 
* has a missing value. So then we have to find the mean value of min_cite_smooth_0.
* The mean value will be the mean of missing values and of identical present values,
* and thus the mean value will be equal to the present values - but now every 
* observation has that value, including the observations for which cite_smooth_0 == 0.
bysort Name: egen min_cite_smooth_0 = min(cite_smooth_0) if cite_smooth_0 > 0
bysort Name: egen min_cite_smooth_0_m = mean(min_cite_smooth_0)
replace cite_smooth_0_in_`year' = min_cite_smooth_0_m if cite_smooth_0_in_`year' == 0

gen cite_smooth_0_norm = cite_smooth_0 / cite_smooth_0_in_`year'

synth cite_smooth_0_norm ///
    YearofPublication English YearofTranslationtoEnglish Socialist Political ///
    cite_smooth_0_norm(1914(1)1916) cite_smooth_0_norm(1908(1)1910) cite_smooth_0_norm(1902(1)1904) cite_smooth_0_norm(1896(1)1898) ///
    cite_smooth_0_norm(1890(1)1892) cite_smooth_0_norm(1884(1)1886) cite_smooth_0_norm(1878(1)1880), ///
    trunit(28) trperiod(1917) resultsperiod(1878(1)1932) mspeperiod(1878(1)1916) ///
    fig keep("synth_results") replace

synth_runner cite_smooth_0_norm ///
    YearofPublication English YearofTranslationtoEnglish Socialist Political ///
    cite_smooth_0_norm(1914(1)1916) cite_smooth_0_norm(1908(1)1910) cite_smooth_0_norm(1902(1)1904) cite_smooth_0_norm(1896(1)1898) ///
    cite_smooth_0_norm(1890(1)1892) cite_smooth_0_norm(1884(1)1886) cite_smooth_0_norm(1878(1)1880), ///
    trunit(28) trperiod(1917) mspeperiod(1878(1)1916) /// 
    gen_vars ///
    keep("synth_runner_results") replace ///
    parallel

    * Cleanup
    drop lead cite_smooth_0_norm_synth effect pre_rmspe post_rmspe
    parallel clean  

    drop cite_smooth_0_in_`year'
    drop min_cite_smooth_0 min_cite_smooth_0_m
    drop cite_smooth_0_norm 
end

foreach year in 1879 1881 1884 1886 1889 1891 1894 1896 1899 1901 1904 1906 1909 1911 1914 1916 {

    normalized_regression `year'
}