benjann / estout

Stata module to make regression tables
http://repec.sowi.unibe.ch/stata/estout/index.html
MIT License
70 stars 17 forks source link

Troubles with tabulating multiple results from "estpost tab" #31

Closed bfjarvis closed 2 years ago

bfjarvis commented 2 years ago

I've been trying to use "estout" (version 3.21 from 2016) to create slightly more complicated tables of summary statistics. My main goal is to have the levels of a categorical variable running down the rows of the table, and then tabulations (within rows) of multiple other categorical variables across the columns. I've nearly got what I want, but it seems that estout is reshuffling the columns in counterintuitive ways. For example, when executing the code below, I was expecting to have the "Totals" column to always be the last in each "model" group, but sometimes it is the first. In other data sets, sometimes the "Totals" column falls somewhere in the middle of the other categories!

sysuse auto

replace headroom = 10*headroom // just for cleanliness, otherwise labels look funny

estpost tab displace rep78
est sto rep78

estpost tab displace headroom
est sto headroom

estpost tab displace foreign
est sto foreign

estout rep78 headroom foreign, cell(b) unstack label

I looked into the code and noticed that there's an equation ordering sub-routine. Maybe that's what's mussing things up? Can there be an option here to simple unstack the equations in the order observed in the source matrix?

Or maybe I'm just all wrong-headed about this. Perhaps there is a better way to achieve what I want?

benjann commented 2 years ago

Hi, it seems to me that this is a kind of usage estout has not really been made for. The reason for the odd arrangement is that estout first tries to match equations across the "models"; the effect of this is that "Total" is merged together which then leads to odd positioning once unstacking. Have a look at the output without option unstack then you know what I mean. By default estout merges the first equation from each model and merges remaining equations by name; this behavior makes sense for regression-type models, but not in your case. The behavior can be changed using option equations() but possibilities are rather limited. I would suggest following a different approach. It seems easier to just make each column a separate model instead of using unstack. However, there will be some other issues with labeling etc. I did not really find an easy solution to your problem, but here is one that does what you want, at least in this specific example:

local mgrps
foreach v of var rep78 headroom foreign {
    local mgrps `mgrps' 1
    levelsof `v' if displace<.
    foreach l in `r(levels)' {
        local mgrps `mgrps' 0
        estpost tab displace if `v'==`l'
        est sto `v'_`l', title(`: lab (`v') `l'')
    }
    estpost tab displace if `v'<.
    est sto `v'_Total, title(Total)
}
levelsof displace
estout rep78_* headroom_* foreign_*, cell(b(vacant(0))) ///
     order(`r(levels)') mlab(, titles) collab(none) ///
     mgroups(rep78 headroom foreign, pattern(`mgrps'))

The solution also takes care of the ordering of the rows. However, as you can see, the code is rather involved... ben

bfjarvis commented 2 years ago

I understand your solution, which is along the lines of what I had in mind, albeit with the added wrinkle that I want row percentages as well as observation counts in each cell.

Is there a way to add an option to treat the equation names "as is", foregoing the reordering routine, which seems designed for cross-model comparison for multiple equation models? Or is this too difficult to shoehorn in at this point?

benjann commented 2 years ago

Changing the code of estout such that equations that have the same name are not merged together would be a nightmare. This is because the code assumes equation names in the combined results to be unique; difficult to change this without breaking anything...

A simple workaround could be to make equation names unique by adding a model-specific prefix before applying estout. This could be done while storing the estimation sets. Here is an example:

prog eststo_ep, eclass
    *! version 1.0.0  23aug2021  Ben Jann
    *  store estimations set after adding the name of the set as a prefix to
    *  each equation
    tempname b
    capt confirm matrix e(b)
    if _rc {
        // e(b) does not exist; store set as is
        est sto `0'
        exit
    }
    // add prefix to equations
    mat `b' = e(b)
    mata: st_matrixcolstripe("`b'", (`"`0'_"':+st_matrixcolstripe("`b'")[,1],/*
        */ st_matrixcolstripe("`b'")[,2]))
    eret repost b = `b', rename
    // store set
    est sto `0'
end

sysuse auto

replace headroom = 10*headroom

estpost tab displace rep78
eststo_ep rep78

estpost tab displace headroom
eststo_ep headroom

estpost tab displace foreign
eststo_ep foreign

estout rep78 headroom foreign, cell(b) unstack modelw(8)

(I did not fix the ordering of rows in this example; see above on how to fix this using levelsof displace.)

An issue with this approach is that the prefixes appear in the table header and need to be removed. You could do this using option substitute(), e.g. something like substitute(rep78_ " " ...). How exactly to specify substitute() will depend on output format, because in some output formats the labels are abbreviated. Possibly, things will be a bit easier to handle if the prefixes are very short. Here is a modified example:

prog eststo_ep2, eclass
    *! version 1.0.0  23aug2021  Ben Jann
    *  syntax: eststo_ep2 name [ prefix ]
    *  store estimations set after adding prefix to the equations
    *  name is used as prefix if prefix is omitted
    args name prefix
    if `"`prefix'"'=="" local prefix `name'
    tempname b
    capt confirm matrix e(b)
    if _rc {
        // e(b) does not exist; store set as is
        est sto `name'
        exit
    }
    // add prefix to equations
    mat `b' = e(b)
    mata: st_matrixcolstripe("`b'", (`"`prefix'_"':+st_matrixcolstripe("`b'")[,1],/*
        */ st_matrixcolstripe("`b'")[,2]))
    eret repost b = `b', rename
    // store set
    est sto `name'
end

sysuse auto

replace headroom = 10*headroom

estpost tab displace rep78
eststo_ep2 rep78 1

estpost tab displace headroom
eststo_ep2 headroom 2

estpost tab displace foreign
eststo_ep2 foreign 3

estout rep78 headroom foreign, cell(b) unstack modelw(8) ///
    substitute(1_ "  " 2_ "  " 3_ "  ")

I hope this helps. ben

NilsEnevoldsen commented 2 years ago

@benjann It would be appropriate to close this issue.