Probabilistic location of points of change for Bayesian models

DominiqueMakowski commented 4 years ago

By locating the points of change (using find_inversions) on all the posterior draws of the link we could have a uni/multi-modal distribution of points of change.

DominiqueMakowski commented 3 years ago

Let me clarify what you said, past-Dom:

The describe_nonlinear() function breaks a nonlinear curve by locating points of direction change.

data <- modelbased::estimate_relation(lm(Sepal.Width ~ poly(Petal.Length, 3), data = iris))
modelbased::describe_nonlinear(data, x = "Petal.Length")
#> Start |  End | Length | Change | Slope |   R2
#> ---------------------------------------------
#> 1.00  | 3.62 |   0.36 |  -1.03 | -0.39 | 0.09
#> 3.62  | 6.90 |   0.54 |   0.51 |  0.16 | 0.09

^{Created on 2021-05-25 by the reprex package (v1.0.0)}

However, in a Bayesian / bootsrapped context, we have many iterations of that "curve". So we could, in theory, get the location of a given inversion across all draws, and thus have a distribution of these locations. And conclude something like; "the relationship between x and y goes from negative to positive at around 0.33 (95% CI [0.21, 0.42])".

This comes with some critical issues:

How to be sure that a given "point" of inversion is the same one at various locations
How to deal with a different pattern / number of inversions, how to summarize them
I'm not sure this is would be even a valid approach, especially since, since the creation of this issue, some packages have emerged looking at "points of change" I think.

I'm not sure it's an issue worth looking further into, especially since describe_nonlinear should not be used as an inferential procedure, but rather as a purely descriptive and exploratory insight into a pattern.

mattansb commented 3 years ago

Is the idea that this would work with smooths as well? If so, I suggest having a look here: https://gavinsimpson.github.io/gratia/reference/derivatives.html

Also, I think @lindeloev might know a thing or two about change points... (:

lindeloev commented 3 years ago

I think this is a pretty useful idea and something I could see myself using regularly! Random thoughts:

Terminology: Describing the x where dy/dx = 0 is quite distinct from change point models where the change point is an extra model parameter and often marks a point of discontinuoity. So it could be confusing to call it "points of change". I'd suggest either sticking with "points of inversion" (though that also sounds somewhat discontinuous to me). If you have the freedom to change the terminology, perhaps "extrema points" would be good?
In my mind, identifying extrema really is descriptive because it's merely a property of the existing fitted parameter(s) - not a new model or inference per see. Just as, e.g., reporting the location of changes in curvature (extrema of the derivative) would be. So I think that it falls well within the domain of describe_nonlinear.
Number of extrema: There can be between [0, N-1] extrema for a poly(N) model. For a given MCMC model, some draws may visit (N-1)-extrema models while other draws visit (N-2)-extrema models. So something like the probability of each extremum (proportion of samples) would need to be reported - at least in cases where all MCMC samples do not result in the same number. Not sure what the best layout for a report would be.

bwiernik commented 3 years ago

The mathematical term for this "point of inversion" is "inflection point". I would suggest using that language.

bwiernik commented 3 years ago

One important thing to bear in mind is that when summarizing multiple curves, the computation needs to be done on the curves (curvewise), not on points collapsing across curves (pointwise). See https://mjskay.github.io/ggdist/reference/curve_interval.html for discussion

lindeloev commented 3 years ago

Ah, sorry I understood it as the first derivative. "Inflection point" is good. My other thoughts are still relevant, I think.

Maybe it could be generalized so the user can choose which derivative to find maximum of:

f': Extrema f'': Inflection

easystats / modelbased

Probabilistic location of points of change for Bayesian models #37