Calculate index with leave-out mean

korenmiklos commented 5 years ago

We should compare to the rest of EU, not total EU. The counterfactual is "how much worse of would Hungary be if it had to match the rest of EU average trade". Way to calculate this in Stata pseudocode

egen sum_eu_j = sum(trade_ijp), by(partner)
egen sum_ij = sum(trade_ijp), by(reporter partner)

gen share_ijp = trade_ijp / sum_ij
* check that it sums to one for all (i,j)

egen total_eu_jp = sum(trade_ijp), by(partner product)
gen rest_eu_ijp = total_eu_jp - trade_ijp
gen share_rest_eu_ijp = rest_eu_ijp / (sum_eu_j - sum_ij)
* check that it sums to one for all (i,j)

gen KLD_component = share_ijp * log(share_ijp / share_rest_eu_ijp)
egen LKD_index = sum(KLD_component), by(reporter partner)

gaborberei commented 5 years ago

I see the point, but I dont know how to integrate this into the present workflow. If we did this, there would be like 100 database (for every selected country) at the end of the workflow which the dash app can use. If we want to avoid this, we need to use a bigger database from a previous work stage in which all the data is available for the calculation of the leave-out mean.

gaborberei commented 5 years ago

Sorry, it is not 100 but 28 datasets

korenmiklos commented 5 years ago

I messed up pandas aggregation. Debug and doublecheck.

ceumicrodata / respect-trade-similarity

Calculate index with leave-out mean #11