Closed chadhazlett closed 6 years ago
I've been thinking about this, and not sure the way to proceed. Since the natural plot can only be done for the bias contour, we will not have any of the other options the R2 has --- t-value, worst case etc. Also since both the relative and absolute interpretation depend on the natural scale it makes it hard to allow the automated interpretation (imagine saying "the impact has to be 13332.3", without units, which is much less appealing than the R2 one ("x% of the residual variation") and for the relative comparisons, the user would have to be aware that comparing impacts and imbalances between the confounder and the covariates would have to be in the same scale, which might be hard to make it explicitly). So I was thinking about the possibility of having a different function for it, instead of making it an option, so the user know exactly what he's doing. Not sure yet though.
I think best as an option (from the user's perspective). But possibly a separate "contourplot" function for the natural scale since the other things don't apply. Thinking about it, the options space is really something like:
That's it right? So why not just have four options: "R2", "natural", "t", and "worstcase". It just happens that they are all contour plots except worstcase. I think that's fine.
Finally, in the spirit of taking one step at a time, please let's get the graph up for the natural scale, and we can figure out the right language for it and the verbal output on the natural scale after.
In terms of the contour plot, it would probably be better to have two arguments: the "scale" or "unit" argument "scale = c("R2", "std", "natural")" and then the "contour" or "levels" argument, "contour = c("estimate", "se", "ci low", "ci up")". We would also have to treat separately some things like limits, labels, benchmarking (probably a separate argument asking explicitly for benchmark variables in the same scale) etc.
I was considering this as very low priority --- my idea was to get everything in the R2 scale working well first.
This is a relatively high priority please, as well as the grouping and the "showvars" issue. Both are basic functionality we need before we ship and that we even would use in the paper.
I'd like to drop the "std" version. no need for a plot of that type.
Again it seems like almost all the plots are contour and so most people will come to understand that what we mean by a plot of a sensitivity analysis is a contour plot. It's the worstcase plot that is the odd one out.
So why don't we have the two functions, countourplot(scale=c("R2","natural), value=c("est","t","upper","lower"), benchmarkset)
(noting that some choices of value will force R2 -- can just throw a warning and do what we want when that happens).
worstcaseplot(benchmarkset)
If you want to wrap them together into a generic plot method that takes type=c("contour","worstcase") as an argument and then dispatches to these two that is fine. But I'd like to get these two functions up and running in the next few days.
On Mon, Jul 24, 2017 at 12:01 PM, Carlos Cinelli notifications@github.com wrote:
In terms of the contour plot, it would probably be better to have two arguments: the "scale" or "unit" argument "scale = c("R2", "std", "natural")" and then the "contour" or "levels" argument, "contour = c("estimate", "se", "ci low", "ci up")". We would also have to treat separately some things like limits, labels, benchmarking (probably a separate argument asking explicitly for benchmark variables in the same scale) etc.
I was considering this as very low priority --- my idea was to get everything in the R2 scale working well first.
At first this was a very low priority for me, I was hoping to get parametrization working well first.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chadhazlett/sensemakr/issues/8#issuecomment-317522358, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVCyAwiiDFELk0Xsc632Pj5bnYGghaIks5sROojgaJpZM4OeuS7 .
Rounding back to this post,
I agree that the plotting should take priority.
Feature wise, i see it as plots > grouping (so we can do plots of groups) > showvars (which just selectively subsets specific already computed quantities)
The groundwork for the grouping feature is here
https://github.com/chadhazlett/sensemakr/pull/6
Operationally, I think the branch merge of the grouping should be the first thing to do. When I made the pull request, the branch was mergeable, 7 days later, it is no longer mergeable
With 1 single conflict https://github.com/chadhazlett/sensemakr/pull/6/conflicts
The conflict is basically the 'main branch' is still using row names instead of the 'group branch' way of using the term to model matrix hash table mapping.
I think it's up to @carloscinelli to think about how he sees the grouping features using 'term hash table' instead of row names can be used in his plot code
I probably won't be able to code in the next couple weeks, but feel free to code what you think needs to be coded
So the latest commit can do R2 contour plots of grouped terms
I'm focusing my attention on masking model matrix terms that are tied to the grouping This is related to the plot(...,showvars) feature
I propose the final returned list of quantities that sensemakr() spits out is
benchmarks <- list(benchmark_all_vars = benchmark_all_vars,
benchmark_R2 = benchmark_R2,
benchmark_masked = benchmark_masked,
benchmark_R2_group = benchmark_R2_group,
benchmark_natural = benchmark_natural,
benchmark_std = benchmark_std)
I need clarification on what the intended use of "benchmark_R2" is to be used for The way I read the code is its a way for showvars + group terms.
comparing line 195 - 211
https://github.com/chadhazlett/sensemakr/blob/master/R/sensemakr.R
benchmark_R2 has the same quantities as benchmark_all_vars , just a specific subset of them from the ones specified in X
I don't think it's needed, since there is now an explicit 'benchmark_R2_group' returned. More specifically, i don't think the 'subset' portion is needed in this step, and should be saved as the 'showvars' option discussed here
https://github.com/chadhazlett/sensemakr/issues/3
If the user wants specific quantities to plot, they can now move to toggling the showvars option
plot(...,showvars)
Depending on the choice of showvars, the plot internals would then pluck quantities from
benchmark_all_vars ** quantities for all columns of the model matrix related to the right hand side of the lm formula
benchmark_R2_group ** quantities related to user specified 'group_list' along with our internally enforced grouping rules (factor levels)
benchmark_masked ** a subset of benchmark_all_vars that has redundant info (already contained in benchmark_R2_group) masked out
benchmark_R2 (which I do not think is needed anymore)
benchmark_masked would be used if the user wishes to only plot points for the parent group (village) and mask the points of the model matrix columns related to the parent group (vilage1,village2,...,vilage100)
Seems like benchmark_all_vars
is only used for the worst case plot where all variables are simultaneously dropped. Instead, I thought this output had 'each' variable dropped one at a time
This cycling thru each variable is really stored in benchmark_R2
(what I said was not necessary in the comment above) But it turns out, this is necessary, just a bad name
This internal naming scheme is a bit confusing. I think a better naming scheme (to disambiguate the above situation) is...
benchmark_worstcase_dropallvars = benchmark_all_vars # single row of quantities
benchmark_each_var = benchmark_R2 # many rows
benchmark_group = benchmark_R2_group # many rows
benchmark_masked = benchmark_each_var[!(benchmark_each_var %in% benchmark_R2_group)] # many rows
benchmark_natural = benchmark_natural # many rows
# benchmark_std = benchmark_std # not necessary
a lot of the propositions are now in the unify branch
see the comment here related to the showvars feature
https://github.com/chadhazlett/sensemakr/issues/3#issuecomment-319465135
Should be an option, right?