Automatic simcf and tile

chrisadolph commented 12 years ago

When I use simcf in my own work, I usually write a long helper function to which I pass my formula and data. The helper function looks through the formula, and then has a series of if() statements which check for each possible covariate that could appear in the model, and, if present, adds a new scenario for that covariate to the cf object. This gets complex if there are potentially interaction terms, so inside that if statement are sub-if() statements which check for possible interactions. It also gets complex when there are categorical or compositional variables, which need to be set in logically consistent ways. In the end, this function basically is sorting all covariates into continuous, binary, categorical, ordered, and compositional piles, and creating appropriate first difference counterfactuals (repectively: mean to mean +1 sd, 0 to 1, mean to each category or baseline to each other category, mean composition to ratio-preserving counterfactual, etc.).

We want a function which, given a formula and dataframe, and some tips on what is categorical, ordered, or compositional, does this automagically. When I do this for my own projects, I spend little time messing around with cfChange() code, which gets written for me.

Once we have an automagic simcf function, writing automagic ropeladder code to go with it should be much easier. Indeed, the likely final call from a user, for a model with binary variables x1 and x3, continuous x2, and ordered x4, would be something like:

res <- lm(y~x1+x2+x3+x4, data)
auto.ropeladder(res, data, binary=c(1,3), continuous=c(2), ordered=c(4), conf=0.95)

All this will be hard, but will be the core of our second release of tile+simcf to CRAN.

ghost commented 12 years ago

Hey Chris,

I would like to start chipping away at this project -- would you mind posting/sending along one of your example helper functions so I can see what you've done in the past? Thanks,

Mike

chrisadolph commented 12 years ago

I've uploaded two examples (with working code, data, and example pdf output; one for logit, and one for ordered probit) of how I do this here. There isn't much documented here in terms of what these data are, so let me know if you need background.

ghost commented 12 years ago

Chris,

Could you outline a bit more specifically what the counterfactuals should be for each type of covariate? Here I what I have:

continuous: mean to mean + 1 SD (got it) binary: 0 to 1 (got it) ordered: mean to each category or baseline to each other category (not sure how to implement either of these options, thinking specifically about the cfChange line that would mirror this code for continuous variables:

cfChange(xscen, paste(s.clean), x = mean(data) + sd(data), scen=scen.num)

compositional piles: to be honest, I'm not sure what a compositional pile is, or how hI would compare it's mean to a ratio-preserving counterfactural.

Any info is appreciated. Thanks,

chrisadolph commented 12 years ago

On 2/28/12 8:24 AM, mikefree88 wrote:

Chris,

Could you outline a bit more specifically what the counterfactuals should be for each type of covariate? Here I what I have: There should also be the ability to globally override these defaults with alternatives for the pre and post for each type. (E.g., if I want to rest all continuous covariate scenarios to be (mean - 1 sd) to (mean + 1 sd), I should be able to).

continuous: mean to mean + 1 SD (got it) yes

binary: 0 to 1 (got it)

yes

ordered: mean to each category or baseline to each other category yes

(not sure how to implement either of these options, thinking specifically about the cfChange line that would mirror this
cfChange(xscen, paste(s.clean), x = mean(data) + sd(data), scen=scen.num)
'''
You need to set up one scenario for each level of the categorical 
variable, so you have to figure out how many categories and what they are.

compositional piles: to be honest, I'm not sure what a compositional pile is, or how hI would compare it's mean to a ratio-preserving counterfactural.

Not sure what "pile" means either. Suppose you have a three counterfactual variable, like this:

{% of population < 18 years, % of population>=18 or <65, % of pop >65}

In any specific case, this will sum to a constraint, like 1.0 or 100.

If three covariates (or two covariates and a reference category) are defined by the user as belonging to the same composition, then any hypothetical change in one should lead to a logically compatible change in the others preserving the fixed sum of all components. rpcf() can help calculate these; I'll tell more about this later.

Chris

Any info is appreciated. Thanks,

Reply to this email directly or view it on GitHub: https://github.com/chrisadolph/tile-simcf/issues/18#issuecomment-4219201

ghost commented 12 years ago

Quick question about defaults: to clarify, the xpost values should always be the same as the xpre values unless the variable is being simulated. For example, if there is a binary variable being simulated, we should set the xpre to 0 for all scenarios, and the xpost also be set to 0 for all scenarios except when evaluating the binary variable. I suppose that if the xpre is set to 0, we would want the xpost to be set to 0 as well unless we were evaluating the variable. I just wanted to double check. Thanks,

Mike

chrisadolph / tile-simcf

Automatic simcf and tile #18