Open DominiqueMakowski opened 4 years ago
Let me clarify what you said, past-Dom:
The describe_nonlinear()
function breaks a nonlinear curve by locating points of direction change.
data <- modelbased::estimate_relation(lm(Sepal.Width ~ poly(Petal.Length, 3), data = iris))
modelbased::describe_nonlinear(data, x = "Petal.Length")
#> Start | End | Length | Change | Slope | R2
#> ---------------------------------------------
#> 1.00 | 3.62 | 0.36 | -1.03 | -0.39 | 0.09
#> 3.62 | 6.90 | 0.54 | 0.51 | 0.16 | 0.09
Created on 2021-05-25 by the reprex package (v1.0.0)
However, in a Bayesian / bootsrapped context, we have many iterations of that "curve". So we could, in theory, get the location of a given inversion across all draws, and thus have a distribution of these locations. And conclude something like; "the relationship between x and y goes from negative to positive at around 0.33 (95% CI [0.21, 0.42])".
This comes with some critical issues:
I'm not sure it's an issue worth looking further into, especially since describe_nonlinear
should not be used as an inferential procedure, but rather as a purely descriptive and exploratory insight into a pattern.
Is the idea that this would work with smooths as well? If so, I suggest having a look here: https://gavinsimpson.github.io/gratia/reference/derivatives.html
Also, I think @lindeloev might know a thing or two about change points... (:
I think this is a pretty useful idea and something I could see myself using regularly! Random thoughts:
Terminology: Describing the x where dy/dx = 0 is quite distinct from change point models where the change point is an extra model parameter and often marks a point of discontinuoity. So it could be confusing to call it "points of change". I'd suggest either sticking with "points of inversion" (though that also sounds somewhat discontinuous to me). If you have the freedom to change the terminology, perhaps "extrema points" would be good?
In my mind, identifying extrema really is descriptive because it's merely a property of the existing fitted parameter(s) - not a new model or inference per see. Just as, e.g., reporting the location of changes in curvature (extrema of the derivative) would be. So I think that it falls well within the domain of describe_nonlinear
.
Number of extrema: There can be between [0, N-1] extrema for a poly(N) model. For a given MCMC model, some draws may visit (N-1)-extrema models while other draws visit (N-2)-extrema models. So something like the probability of each extremum (proportion of samples) would need to be reported - at least in cases where all MCMC samples do not result in the same number. Not sure what the best layout for a report would be.
The mathematical term for this "point of inversion" is "inflection point". I would suggest using that language.
One important thing to bear in mind is that when summarizing multiple curves, the computation needs to be done on the curves (curvewise), not on points collapsing across curves (pointwise). See https://mjskay.github.io/ggdist/reference/curve_interval.html for discussion
Ah, sorry I understood it as the first derivative. "Inflection point" is good. My other thoughts are still relevant, I think.
Maybe it could be generalized so the user can choose which derivative to find maximum of:
f': Extrema f'': Inflection
By locating the points of change (using
find_inversions
) on all the posterior draws of the link we could have a uni/multi-modal distribution of points of change.