Closed DominiqueMakowski closed 5 years ago
Good to have a native speaker on board! :-)
Happy to be here, to say the least.
Man, I haven't time to re-read the final pdf π this timezone thing is though, we should define some easybreaks with no critical activity for lunches, dinners, parties and whatnot.
But it's okay I am sure the revisions are perfect βΊοΈ
Ha! I like that: easybreaks. But I hear you - the time is tricky. But rest assured we incorporated your suggestions. Itβs looking good!
lunches, dinners, parties and whatnot.
Do this once you're retired! Believe me, I studied gerontology ;-)
@DominiqueMakowski @pdwaggoner
We got the response from Alex, so let's see what we can address in which way.
find_weights() -> weights_name(), get_weights() -> weights_values()
This is doable, the question is, do we want that? According to our code-style convention, we should stay with get/find. We could, of course, add aliases for all functions if this makes life easier for users.
Note that you have chosen to define terms and variables in a non-standard way for the R community. In an R model formula an entire chunk such as I(x^2) is referred to as a term, rather than a variable. I recommend following this definition, which amounts to flipping the current usage of term and variable.
I'm not sure about the definition, and couldn't yet find proper "definitions". If we really break conventions here, it might be better to flip the names, as suggested. That would mean that we need to use an intermediate version check for insight in our packages where we use one of these functions, else we will have breaking code.
Finally, I recommend grouping get*() and find*() together on the pkgdown site in the Reference section. That is, you want get_parameters() and find_parameters() (or parameter_names() and parameter_values()) to appear next to each other.
Not sure how to do this in pkgdown, except when we "merge" all related find/get-function together in one doc. But that would mess up the help-files, so I would suggest finding another solution.
It makes me nervous that the example at https://easystats.github.io/insight/reference/clean_parameters.html warns about overwriting quosure methods.
This is an R 3.6 thing, and related to other packages. Easy to address, I think (by just responding/explaining)
I appreciate that you've linked to packages that use insight. However, this is not especially useful to users -- instead of demonstrating how to solve a specific problem with insight, the user must now go read an entire package on their own, find a usage of insight, and infer the problem that is being solved. I would construct explicit, self-contained examples showing idiomatic usage. If these are inspired by the linked packages, fantastic, but merely linking to another code base isn't that helpful.
Not sure. Maybe it's easy to address, if we just add one or two code-examples to the paper? If so, that is rather easy to address.
Instead of providing a new generic n_obs() I would provide methods for the existing generic nobs().
On the one hand, yes, we could do that. On the other hand, once other packages add own nobs()
-method, users will always see the warning that insight oberwrites S3-methods. But this is probanly not often the case. I would suggest keeping n_obs()
, and add nobs()
where needed (and then the definition would be nobs.xyz <- n_obs.xyz
), what do you think?
1) > find_weights() -> weights_name(), get_weights() -> weights_values()
This is doable, the question is, do we want that? According to our code-style convention, we should stay with get/find. We could, of course, add aliases for all functions if this makes life easier for users.
I agree with you.
2) > Note that you have chosen to define terms and variables in a non-standard way for the R community. In an R model formula an entire chunk such as I(x^2) is referred to as a term, rather than a variable. I recommend following this definition, which amounts to flipping the current usage of term and variable.
I'm not sure about the definition, and couldn't yet find proper "definitions". If we really break conventions here, it might be better to flip the names, as suggested. That would mean that we need to use an intermediate version check for insight in our packages where we use one of these functions, else we will have breaking code.
> model <- lm(Sepal.Length ~ I(Sepal.Width^2), data = iris)
> insight::find_variables(model)
$response
[1] "Sepal.Length"
$conditional
[1] "I(Sepal.Width^2)"
> insight::find_terms(model)
$response
[1] "Sepal.Length"
$conditional
[1] "Sepal.Width"
I am okay with swapping the two. Better doing the breaking change now than in the future.
Finally, I recommend grouping get() and find() together on the pkgdown site in the Reference section. That is, you want get_parameters() and find_parameters() (or parameter_names() and parameter_values()) to appear next to each other. Not sure how to do this in pkgdown, except when we "merge" all related find/get-function together in one doc. But that would mess up the help-files, so I would suggest finding another solution.
AFAIK the only way to do that is indeed to have the same dosctring for both functions using @rdname
.I think it's doable in most cases to merge the docstrings.
t makes me nervous that the example at https://easystats.github.io/insight/reference/clean_parameters.html warns about overwriting quosure methods. This is an R 3.6 thing, and related to other packages. Easy to address, I think (by just responding/explaining)
Agreed.
I appreciate that you've linked to packages that use insight. However, this is not especially useful to users -- instead of demonstrating how to solve a specific problem with insight, the user must now go read an entire package on their own, find a usage of insight, and infer the problem that is being solved. I would construct explicit, self-contained examples showing idiomatic usage. If these are inspired by the linked packages, fantastic, but merely linking to another code base isn't that helpful. Not sure. Maybe it's easy to address, if we just add one or two code-examples to the paper? If so, that is rather easy to address.
I think what is lacking here is a small "Example" section on the README. We could demonstrate some of the find_*
functions (as the get_
ones have a long output not really appropriate for a README) for a model such as the one above (with the iris dataset)? I can give it a try.
Instead of providing a new generic n_obs() I would provide methods for the existing generic nobs(). On the one hand, yes, we could do that. On the other hand, once other packages add own nobs()-method, users will always see the warning that insight oberwrites S3-methods. But this is probanly not often the case. I would suggest keeping n_obs(), and add nobs() where needed (and then the definition would be nobs.xyz <- n_obs.xyz), what do you think?
I agree, we should not redefine nobs
as it is indeed used by quite many packages, but rather add the method if necessary.
These are all some good thoughts from Alex as well as from you both on moving forward. My broader question is in the scope and location of changes requested. It's unclear where Alex is requesting changes: in the paper or in the repo/software. A response to this would influence where and how we proceed in making corrections. After reading me responses below, in addition to this question, let me know how you'd like to proceed with corrections to get this thing published! We are close. Any thoughts there?
Finally, here are a few thoughts on numbers where is makes most sense for me to respond:
This is doable, the question is, do we want that? According to our code-style convention, we should stay with get/find. We could, of course, add aliases for all functions if this makes life easier for users.
I also agree that we should stick to (and thus defend, appropriately) our code-style convention.
I'm not sure about the definition, and couldn't yet find proper "definitions". If we really break conventions here, it might be better to flip the names, as suggested. That would mean that we need to use an intermediate version check for insight in our packages where we use one of these functions, else we will have breaking code.
I also agree that making a switch consistent with broader R conventions is probably a good idea, though, I too was unaware of this broader community definition. Not sure if this is worth fighting for or not...
Not sure. Maybe it's easy to address, if we just add one or two code-examples to the paper? If so, that is rather easy to address.
I agree with @DominiqueMakowski here - I think we have demonstrated usability for a "problem" a user may encounter in the paper already (though see my comments at the outset here, on whether Alex is wanting this change in the documentation or in the paper). If its the paper, I think we gently point him to the fact that we have addressed this already, but that we could add a sentence clarifying that we have demonstrated a "real-world" problem and how insight helps with that. But if he is wanting the change in the documentation, then I like @DominiqueMakowski 's suggestion here of adding a couple more "applied" examples to the README. Let me know if you need my help here at all!
Oh also, for what its worth, I agree with Alex that the name should maybe switch back to the original, as we aren't necessary introducing new syntax.
I have found following pages: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html https://bookdown.org/ccolonescu/RPoE4/indvars.html https://www.datacamp.com/community/tutorials/r-formula-tutorial
These indicate that we indeed swapped "variable" and "term" and thus should flip the functions.
@DominiqueMakowski I think we have addressed most issues from the 2nd review round now. Missing is the example.
I think what is lacking here is a small "Example" section on the README. We could demonstrate some of the find* functions (as the get ones have a long output not really appropriate for a README) for a model such as the one above (with the iris dataset)? I can give it a try.
Do you think you find some time for this the next days, or should I start working on this?
as you wish if you already have an idea of what to do please don't wait for me, I'll check and expand if need be
Hi guys - apologies for my silence lately on this. Teaching for the summer is nearly finished! I can give the example a try in about 3 hours when I finished teaching. I should have something shortly, if you want to wait. Obviously, if you'd like to move faster (< 3 hr.), please feel free to do so.
Ok, I'll wait for your input, I'm working on another issue anyway (the nobs()
-thing Alex mentioned).
@pdwaggoner @DominiqueMakowski
There's one comment:
I appreciated the new definitions of the various components of a model. Instead of presenting this material all at once, I recommend you group the definitions by components that are easy to confuse and set these groups visually apart.
I'm not 100% sure what is actually meant, do you have an idea?
Maaaaybe something like (strong speculation here):
x + poly(x, 2)
has only the variable x
.x + poly(x, 2)
has one variable x
, but two terms x
and poly(x, 2)
.(btw I spotted a typo in Predictors are "unqiue"
Ok, what do you think about this, covering most of the find_*()
functions? If you like it, I am happy to include a new section on the README with this.
Let me know:
> library(insight)
> # Fit the model
> model <- lm(Sepal.Length ~ I(Sepal.Width ^ 2),
+ data = iris)
> # find_algorithm: Find sampling algorithm and optimizers
> find_algorithm(model)
$algorithm
[1] "OLS"
> # find_formula: Find model formula
> find_formula(model)
$conditional
Sepal.Length ~ I(Sepal.Width^2)
> # find_interactions: Find interaction terms from models
> find_interactions(model)
NULL
> ## Fit INXN model
> model_inxn <- lm(Sepal.Length ~ Petal.Length * Sepal.Width,
+ data = iris)
> find_interactions(model_inxn)
$conditional
[1] "Petal.Length:Sepal.Width"
> # find_parameters: Find names of model parameters
> find_parameters(model)
$conditional
[1] "(Intercept)" "I(Sepal.Width^2)"
> find_parameters(model_inxn)
$conditional
[1] "(Intercept)" "Petal.Length" "Sepal.Width" "Petal.Length:Sepal.Width"
> # find_predictors: Find names of model predictors
> find_predictors(model)
$conditional
[1] "Sepal.Width"
> find_predictors(model_inxn)
$conditional
[1] "Petal.Length" "Sepal.Width"
> # find_response: Find name of the response variable
> find_response(model)
[1] "Sepal.Length"
> # find_terms: Find all model terms
> find_terms(model)
$response
[1] "Sepal.Length"
$conditional
[1] "I(Sepal.Width^2)"
> find_terms(model_inxn)
$response
[1] "Sepal.Length"
$conditional
[1] "Petal.Length" "Sepal.Width"
> # find_variables: Find names of all variables
> find_variables(model)
$response
[1] "Sepal.Length"
$conditional
[1] "Sepal.Width"
Also, I just merged a PR fixing that typo @DominiqueMakowski . Good catch!
Another (complementary) idea to demonstrate the usefulness of the package (rather than demonstrating the functions) would be to underline the "unified interface" aspect.
What does it allow to do? For instance, let's say we have a lot of models of different nature:
model1 <- lm(...)
model2 <- lmer(...)
model3 <- stan_glm(...)
model4 <- glmmTMB(...)
model5 <- gamm4(...)
With insight we can easily loop through them and apply the same insight functions; it works in a consistent, neat fashion.
for(model in c(model1, model2, ...)){
insight::find_parameters(model)
}
I would put this into a specific context, e.g. credible intervals for random effects of zero.inflated component:
m <- download_model("brms_zi_3")
sapply(get_parameters(m, effects = "random", component = "zi"), quantile, probs = c(.025, .975))
Or only for random effects of conditional component, so we can apply it to rstanarm and brms models
sapply(get_parameters(m, effects = "random", component = "conditonal"), quantile, probs = c(.025, .975))
or something like, how to use it for predict()
:
m <- lm(Sepal.Length ~ Species + Petal.Width + Sepal.Width, data = iris)
dat <- get_data(m)
pred <- find_predictors(m, flatten = TRUE)
l <- lapply(pred, function(x) {
if (is.numeric(dat[[x]]))
mean(dat[[x]])
else
unique(dat[[x]])
})
names(l) <- pred
cbind(l, predictions = predict(m, newdata = as.data.frame(l)))
And according to the second remaining issue, I would slightly modify your suggestion:
x + poly(x, 2)
has only the variable x
.x + poly(x, 2)
has one variable x
, but two terms x
and poly(x, 2)
.maybe according to the examples, we can mix both my suggestion Dominiques suggesstion of how to demonstrate the usefulness? And I would put these examples before the examples from other packages in the paper, and probably somewhere at the end of the readme?
Great suggestions - I was just working on something similar, as you wouldn't be able to loop over model objects, thereby necessitating use of the *apply
family. (And I agree on placement both in the paper and in the README; I am happy to make those changes once you guys OK the example) Here is my idea. What do you think?
# model 1 - lm
model1 <- lm(Sepal.Length ~ I(Sepal.Width ^ 2), data = iris)
# model 2 - glm
model2 <- glm(am ~ mpg + hp, family = binomial, data = mtcars)
# model 3 - gamm4
library(gamm4)
set.seed(0)
dat <- gamSim(1, n = 400, scale = 2)
dat$fac <- fac <- as.factor(sample(1:20, 400, replace = TRUE))
dat$y <- dat$y + model.matrix(~fac - 1) %*% rnorm(20) * 0.5
model3 <- gamm4(y ~ s(x0) + x1 + s(x2), data = dat, random = ~(1 | fac))
models <- paste0("model", 1:3)
sapply(models, function(x)insight::find_parameters(get(x)))
``
What do you think? https://easystats.github.io/insight/articles/insight.html
See also readme: https://github.com/easystats/insight/
Next, I will add the examples.
These are really great changes! For the README examples, are you thinking more technical (e.g., using sapply)? Or are you thinking to essentially replicate the pkgdown examples? And should we then use these same examples again in the paper? Or leave the current paper examples as they are?
For the README examples, are you thinking more technical (e.g., using sapply)?
The examples we already have in the paper are great for demonstrating how easy it is to access model information.
I think what we should add, and probably what Alex meant, are examples that show which real-world-problems can be solved with insight? Thus my idea to describe in which packages it is already used, but he said that this would mean for users they must look up the code in that repositories.
So I think we need examples that do something small, like "predictions" or "credible intervals" etc. Something like a "higher order problem" that can be solved using insight. And this I would add to the paper (before pkg examples) and readme. And I would preserve your examples.
That seems fair. Although the main benefit of using insight is its "universality". Hence, we could also showcase how one can benefit from insight with a "dumb" example like the following:
Let's create a function that nicely prints all the parameters of a model:
print_params <- function(model){
paste0("My parameters are ",
paste0(row.names(summary(model)$coefficients), collapse = ", "),
", thank you for your attention!")
}
Let's test it:
print_params(lm(Sepal.Length ~ Petal.Width, data = iris))
[1] "My parameters are (Intercept), Petal.Width, thank you for your attention!"
Great, it works! Let's do this with a GAM:
print_params(mgcv::gam(Sepal.Length ~ Petal.Width + s(Petal.Length), data = iris))
[1] "My parameters are , thank you for your attention!"
Oh no, it doens't work for this class of models π’
As the access to models depends on the type of the model in the R ecosystem, we would need to create specific methods for all types π±
"It's a bird! It's a plane! It's insight!"
With insight
, you can write a function without having to worry about the model type π
print_params <- function(model){
paste0("My parameters are ",
paste0(insight::find_parameters(model, flatten = TRUE), collapse = ", "),
", thank you for your attention!")
}
print_params(lm(Sepal.Length ~ Petal.Width, data = iris))
[1] "My parameters are (Intercept), Petal.Width, thank you for your attention!"
print_params(mgcv::gam(Sepal.Length ~ Petal.Width + s(Petal.Length), data = iris))
[1] "My parameters are (Intercept), Petal.Width, s(Petal.Length), thank you for your attention!"
Yeah, that could be a good "summary" example to point out its universality.
Love this idea @DominiqueMakowski . I was missing the "real world" application here, and rather just demonstrating the use of the package. But this is a great approach to showing the functionality of the package. Love it.
Great example! Took two reads (running on low sleep) but clearly conveys the purpose and utility of insight.
Took two reads (running on low sleep)
That's fair statistic, and a convincing argument for any text, actually - "if takes no more than two reads before you had a coffee, people understand what you've written" :-D
Revised paper, I'll response to Alex' comments now.
Closing, Alex has recommended accepting the paper.
Super exciting! Great work guys.
What about submitting this to JOSS or Ropensci + JOSS?