carloscinelli / sensemakr

Suite of sensitivity analysis tools for OLS
https://carloscinelli.com/sensemakr/
83 stars 15 forks source link

plot with natural parameters #8

Closed chadhazlett closed 6 years ago

chadhazlett commented 6 years ago

Should be an option, right?

carloscinelli commented 6 years ago

I've been thinking about this, and not sure the way to proceed. Since the natural plot can only be done for the bias contour, we will not have any of the other options the R2 has --- t-value, worst case etc. Also since both the relative and absolute interpretation depend on the natural scale it makes it hard to allow the automated interpretation (imagine saying "the impact has to be 13332.3", without units, which is much less appealing than the R2 one ("x% of the residual variation") and for the relative comparisons, the user would have to be aware that comparing impacts and imbalances between the confounder and the covariates would have to be in the same scale, which might be hard to make it explicitly). So I was thinking about the possibility of having a different function for it, instead of making it an option, so the user know exactly what he's doing. Not sure yet though.

chadhazlett commented 6 years ago

I think best as an option (from the user's perspective). But possibly a separate "contourplot" function for the natural scale since the other things don't apply. Thinking about it, the options space is really something like:

That's it right? So why not just have four options: "R2", "natural", "t", and "worstcase". It just happens that they are all contour plots except worstcase. I think that's fine.

Finally, in the spirit of taking one step at a time, please let's get the graph up for the natural scale, and we can figure out the right language for it and the verbal output on the natural scale after.

carloscinelli commented 6 years ago

In terms of the contour plot, it would probably be better to have two arguments: the "scale" or "unit" argument "scale = c("R2", "std", "natural")" and then the "contour" or "levels" argument, "contour = c("estimate", "se", "ci low", "ci up")". We would also have to treat separately some things like limits, labels, benchmarking (probably a separate argument asking explicitly for benchmark variables in the same scale) etc.

I was considering this as very low priority --- my idea was to get everything in the R2 scale working well first.

chadhazlett commented 6 years ago

This is a relatively high priority please, as well as the grouping and the "showvars" issue. Both are basic functionality we need before we ship and that we even would use in the paper.

I'd like to drop the "std" version. no need for a plot of that type.

Again it seems like almost all the plots are contour and so most people will come to understand that what we mean by a plot of a sensitivity analysis is a contour plot. It's the worstcase plot that is the odd one out.

So why don't we have the two functions, countourplot(scale=c("R2","natural), value=c("est","t","upper","lower"), benchmarkset)

(noting that some choices of value will force R2 -- can just throw a warning and do what we want when that happens).

worstcaseplot(benchmarkset)

If you want to wrap them together into a generic plot method that takes type=c("contour","worstcase") as an argument and then dispatches to these two that is fine. But I'd like to get these two functions up and running in the next few days.

On Mon, Jul 24, 2017 at 12:01 PM, Carlos Cinelli notifications@github.com wrote:

In terms of the contour plot, it would probably be better to have two arguments: the "scale" or "unit" argument "scale = c("R2", "std", "natural")" and then the "contour" or "levels" argument, "contour = c("estimate", "se", "ci low", "ci up")". We would also have to treat separately some things like limits, labels, benchmarking (probably a separate argument asking explicitly for benchmark variables in the same scale) etc.

I was considering this as very low priority --- my idea was to get everything in the R2 scale working well first.

At first this was a very low priority for me, I was hoping to get parametrization working well first.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chadhazlett/sensemakr/issues/8#issuecomment-317522358, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyAwiiDFELk0Xsc632Pj5bnYGghaIks5sROojgaJpZM4OeuS7 .

statsccpr commented 6 years ago

Rounding back to this post,

I agree that the plotting should take priority.

Feature wise, i see it as plots > grouping (so we can do plots of groups) > showvars (which just selectively subsets specific already computed quantities)

The groundwork for the grouping feature is here

https://github.com/chadhazlett/sensemakr/pull/6

Operationally, I think the branch merge of the grouping should be the first thing to do. When I made the pull request, the branch was mergeable, 7 days later, it is no longer mergeable

With 1 single conflict https://github.com/chadhazlett/sensemakr/pull/6/conflicts

The conflict is basically the 'main branch' is still using row names instead of the 'group branch' way of using the term to model matrix hash table mapping.

I think it's up to @carloscinelli to think about how he sees the grouping features using 'term hash table' instead of row names can be used in his plot code

carloscinelli commented 6 years ago

I probably won't be able to code in the next couple weeks, but feel free to code what you think needs to be coded

statsccpr commented 6 years ago

So the latest commit can do R2 contour plots of grouped terms

I'm focusing my attention on masking model matrix terms that are tied to the grouping This is related to the plot(...,showvars) feature

I propose the final returned list of quantities that sensemakr() spits out is

  benchmarks <- list(benchmark_all_vars = benchmark_all_vars,
                     benchmark_R2 = benchmark_R2,
                     benchmark_masked = benchmark_masked,
                     benchmark_R2_group = benchmark_R2_group,
                     benchmark_natural = benchmark_natural,
                     benchmark_std = benchmark_std)

I need clarification on what the intended use of "benchmark_R2" is to be used for The way I read the code is its a way for showvars + group terms.

comparing line 195 - 211

https://github.com/chadhazlett/sensemakr/blob/master/R/sensemakr.R

benchmark_R2 has the same quantities as benchmark_all_vars , just a specific subset of them from the ones specified in X

I don't think it's needed, since there is now an explicit 'benchmark_R2_group' returned. More specifically, i don't think the 'subset' portion is needed in this step, and should be saved as the 'showvars' option discussed here

https://github.com/chadhazlett/sensemakr/issues/3

If the user wants specific quantities to plot, they can now move to toggling the showvars option

plot(...,showvars)

Depending on the choice of showvars, the plot internals would then pluck quantities from

benchmark_masked would be used if the user wishes to only plot points for the parent group (village) and mask the points of the model matrix columns related to the parent group (vilage1,village2,...,vilage100)

statsccpr commented 6 years ago

Seems like benchmark_all_vars is only used for the worst case plot where all variables are simultaneously dropped. Instead, I thought this output had 'each' variable dropped one at a time

This cycling thru each variable is really stored in benchmark_R2 (what I said was not necessary in the comment above) But it turns out, this is necessary, just a bad name

This internal naming scheme is a bit confusing. I think a better naming scheme (to disambiguate the above situation) is...

  benchmark_worstcase_dropallvars = benchmark_all_vars  # single row of quantities
  benchmark_each_var = benchmark_R2  # many rows
  benchmark_group = benchmark_R2_group  # many rows
  benchmark_masked = benchmark_each_var[!(benchmark_each_var %in% benchmark_R2_group)]  # many rows
  benchmark_natural = benchmark_natural  # many rows  
  # benchmark_std = benchmark_std # not necessary
statsccpr commented 6 years ago

a lot of the propositions are now in the unify branch

see the comment here related to the showvars feature

https://github.com/chadhazlett/sensemakr/issues/3#issuecomment-319465135