Closed DominiqueMakowski closed 4 years ago
Note that version 2.0.0
is now on CRAN, with full support for tetrachoric, polychoric, rank based ("true" rank based, where the ranks are sampled from truncated normal distribution), and of course the continuous case. Also, now the samplers are written in c++
.
Here are the options for estimating correlations
First note the classes have changed
fit <- estimate(mtcars)
class(fit)
"BGGM" "estimate" "default"
Tetrachoric
# binary data
Y <- BGGM::women_math[,1:3]
# fit
tetra <- zero_order_cors(Y, type = "binary")
# correlation mean
tetra$R_mean
# correlations in 3d array
tetra$R
Polychoric
Y <- BGGM::ptsd[,1:3]
polychor <- zero_order_cors(Y +1, type = "ordinal")
Rank based (Gaussian copula)
# ordinal
Y <- BGGM::ptsd[,1:3]
rank_gc <- zero_order_cors(Y, type = "mixed")
Note that there are no summary, print, etc. functions for correlations because the focus is on the GGM. Accordingly, it is possible to estimate the partial correlations and then transform after the fact. But there is plenty of documentation here that show examples of how to summarize, plot the credible intervals, etc.
# ordinal. "mixed" refers to mixed data,
# but can be used with only ordinal.
Y <- BGGM::ptsd[,1:3]
# fit model
rank_gc <- estimate(Y, type = "mixed")
# pcor to cor
pcor_to_cor(rank_gc)
Note finally the Bayes factor testing methods are specifically for the partial correlations (see for example explore
) and not sure the prior distribution would be ideal for testing correlations. For this, we plan to add the testing of correlations BFpack
.
What would be the posterior "distribution"? $pcor_mat
?
That is the posterior mean. There is a function to access that directly.
Y <- BGGM::ptsd[,1:3]
rank_gc <- estimate(Y, type = "mixed")
pcor_mat(rank_gc)
Then if you want the posterior samples for the partial correlations
posterior_samples(rank_gc)
The posterior for the correlations is obtained with either pcor_to_cor
or zero_order_cors
ok, insight::get_parameters()
currently returns the same as pcor_mat()
. Now, I'm not sure what get_parameters()
should preferably return, pcor_mat()
or posterior_samples()
?
If I understand right, pcor_mat(model)
gives the same correlation coefficients as colMeans(posterior_samples(model))
. Following the logic of get_parameters()
for Bayesian models, I'd say we return posterior_samples(model)
- however, when used in correlation, @DominiqueMakowski has to fiddle around with the print-method... Dom, you must say what you imagine for bayestestR and correlation... For bayestestR, posterior_samples(model)
makes more sense. See here:
library(BGGM)
data("mtcars")
model <- BGGM::estimate(mtcars)
# correlation table
pcor_mat(model)
# posterior mean from posterior samples of correlation coefficients
colMean(posterior_samples(model))
# posterior samples of correlation coefficients
head(posterior_samples(model))
(can't create a reprex, because BGGM requires .Random.seed
to be present somewhere?!?)
The one note about posterior_samples
is that, for the case of binary and ordinal data, it also returns additional parameters. This is because "under the hood" a multivariate probit model is being estimated so returns the intercepts when formula = NULL
For the case of the partial correlation, the following would work
Y <- BGGM::ptsd[,1:3]
# notice the +1 (first category must be 1)
polychor <- estimate(Y+1, type = "ordinal")
samps <- posterior_samples(tetra)
samps[, !grepl( "(Intercept)" , colnames( samps ) )]
Also to be clear, the following returns partial correlations. For correlations, see zero_order_cors
which will return an array of the sampled correlation matrices.
samps <- posterior_samples(tetra)
what is tetra
?
# binary data
Y <- BGGM::women_math[,1:3]
# fit
tetra <- estimate(Y, type = "binary")
# partial correlations
samps <- posterior_samples(tetra)
ok, insight::get_parameters() currently returns the same as pcor_mat(). Now, I'm not sure what get_parameters() should preferably return, pcor_mat() or posterior_samples()?
@strengejacke Yes I agree, get_parameters should return the posterior distribution rather than postprocessed estimates. It's good also for how correlation
works (since everything relies on insight)
The one note about posterior_samples is that, for the case of binary and ordinal data, it also returns additional parameters.
It's okay, as long as these additional parameters can be conceived as parameters (in the broader sense of estimates), get_parameters can return them as well, if necessary.
@donaldRwilliams what kind or type of additional parameters are returned? only intercepts?
Dom, is this ok?
> m <- estimate(ptsd[, 1:4] + 1, type = "ordinal", iter = 250)
BGGM: Posterior Sampling
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
BGGM: Finished
> find_parameters(m)
$correlation
[1] "B1--B2" "B1--B3" "B2--B3" "B1--B4" "B2--B4" "B3--B4"
>
> find_parameters(m, "correlation")
$correlation
[1] "B1--B2" "B1--B3" "B2--B3" "B1--B4" "B2--B4" "B3--B4"
>
> find_parameters(m, "intercept")
$intercept
[1] "B1" "B2" "B3" "B4"
>
> find_parameters(m, "all")
$intercept
[1] "B1" "B2" "B3" "B4"
$correlation
[1] "B1--B2" "B1--B3" "B2--B3" "B1--B4" "B2--B4" "B3--B4"
>
>
> head(get_parameters(m))
B1--B2 B1--B3 B2--B3 B1--B4 B2--B4 B3--B4
1 0.06271854 0.2427961 0.5071544 0.3368148 0.206401239 0.1764215
2 0.17704425 0.1100967 0.5914172 0.4341394 0.037202144 0.3044216
3 0.21025450 0.1218805 0.5502681 0.2989079 0.024295643 0.4361668
4 0.25463391 0.2463020 0.4888627 0.3511248 0.005410656 0.3560969
5 0.27132060 0.1452279 0.5675196 0.3868580 -0.111311547 0.4403248
6 0.13040916 0.1787590 0.6628062 0.4669703 -0.203335744 0.4337148
>
> head(get_parameters(m, "correlation"))
B1--B2 B1--B3 B2--B3 B1--B4 B2--B4 B3--B4
1 0.06271854 0.2427961 0.5071544 0.3368148 0.206401239 0.1764215
2 0.17704425 0.1100967 0.5914172 0.4341394 0.037202144 0.3044216
3 0.21025450 0.1218805 0.5502681 0.2989079 0.024295643 0.4361668
4 0.25463391 0.2463020 0.4888627 0.3511248 0.005410656 0.3560969
5 0.27132060 0.1452279 0.5675196 0.3868580 -0.111311547 0.4403248
6 0.13040916 0.1787590 0.6628062 0.4669703 -0.203335744 0.4337148
>
> head(get_parameters(m, "intercept"))
B1_(Intercept) B2_(Intercept) B3_(Intercept) B4_(Intercept)
1 0.8898790 0.5065383 0.3587859 1.0772496
2 1.0284449 0.5003012 0.2768598 1.2143462
3 0.9359056 0.6566056 0.2096932 0.7070619
4 0.7775327 0.3991683 0.3692782 0.9101484
5 1.1894994 0.3041367 0.1150038 0.9993084
6 1.0227258 0.4969963 0.1857669 0.8361187
>
> head(get_parameters(m, "all"))
B1--B2 B1--B3 B2--B3 B1--B4 B2--B4 B3--B4 B1_(Intercept) B2_(Intercept)
1 0.06271854 0.2427961 0.5071544 0.3368148 0.206401239 0.1764215 0.8898790 0.5065383
2 0.17704425 0.1100967 0.5914172 0.4341394 0.037202144 0.3044216 1.0284449 0.5003012
3 0.21025450 0.1218805 0.5502681 0.2989079 0.024295643 0.4361668 0.9359056 0.6566056
4 0.25463391 0.2463020 0.4888627 0.3511248 0.005410656 0.3560969 0.7775327 0.3991683
5 0.27132060 0.1452279 0.5675196 0.3868580 -0.111311547 0.4403248 1.1894994 0.3041367
6 0.13040916 0.1787590 0.6628062 0.4669703 -0.203335744 0.4337148 1.0227258 0.4969963
B3_(Intercept) B4_(Intercept)
1 0.3587859 1.0772496
2 0.2768598 1.2143462
3 0.2096932 0.7070619
4 0.3692782 0.9101484
5 0.1150038 0.9993084
6 0.1857669 0.8361187
>
> head(clean_parameters(m))
Parameter Effects Component Cleaned_Parameter
1 B1 fixed intercept B1
2 B2 fixed intercept B2
3 B3 fixed intercept B3
4 B4 fixed intercept B4
5 B1--B2 fixed conditional B1--B2
6 B1--B3 fixed conditional B1--B3
looks good to me!
@strengejacke "what kind or type of additional parameters are returned"
with formula = NULL
only intercepts are returned.
Do you have a "more complex" example? So I can check that the implementation is sufficient? After that, I could close this issue.
@strengejacke What kind of example do you have in mind ?
Something with formula
involved that also returns additional parameters apart from Intercept
.
@strengejacke Apologies on the delay.
Here is an example, where partial correlations or correlations are estimated, controlling for education.
Y <- bfi[,c(1:5, 27)]
fit <- estimate(Y, formula = ~ as.factor(education))
samps <- posterior_samples(fit)
colnames(samps)
[1] "A1--A2" "A1--A3" "A2--A3" "A1--A4"
[5] "A2--A4" "A3--A4" "A1--A5" "A2--A5"
[9] "A3--A5" "A4--A5" "A1_(Intercept)" "A1_as.factor(education)2"
[13] "A1_as.factor(education)3" "A1_as.factor(education)4" "A1_as.factor(education)5" "A2_(Intercept)"
[17] "A2_as.factor(education)2" "A2_as.factor(education)3" "A2_as.factor(education)4" "A2_as.factor(education)5"
[21] "A3_(Intercept)" "A3_as.factor(education)2" "A3_as.factor(education)3" "A3_as.factor(education)4"
[25] "A3_as.factor(education)5" "A4_(Intercept)" "A4_as.factor(education)2" "A4_as.factor(education)3"
[29] "A4_as.factor(education)4" "A4_as.factor(education)5" "A5_(Intercept)" "A5_as.factor(education)2"
[33] "A5_as.factor(education)3" "A5_as.factor(education)4" "A5_as.factor(education)5"
Also, what is the best way to allow users to know BGGM
is supported ? Perhaps add insight
to Suggests ?
insight::is_model_supported()
returns TRUE
. Here you find BGGM
listed. Once we also add tests, we would add BGGM as Suggests anyway, but not sure if/when we will add tests.
@strengejacke Apologies on the delay.
Here is an example, where partial correlations or correlations are estimated, controlling for education.
Y <- bfi[,c(1:5, 27)] fit <- estimate(Y, formula = ~ as.factor(education)) samps <- posterior_samples(fit) colnames(samps) [1] "A1--A2" "A1--A3" "A2--A3" "A1--A4" [5] "A2--A4" "A3--A4" "A1--A5" "A2--A5" [9] "A3--A5" "A4--A5" "A1_(Intercept)" "A1_as.factor(education)2" [13] "A1_as.factor(education)3" "A1_as.factor(education)4" "A1_as.factor(education)5" "A2_(Intercept)" [17] "A2_as.factor(education)2" "A2_as.factor(education)3" "A2_as.factor(education)4" "A2_as.factor(education)5" [21] "A3_(Intercept)" "A3_as.factor(education)2" "A3_as.factor(education)3" "A3_as.factor(education)4" [25] "A3_as.factor(education)5" "A4_(Intercept)" "A4_as.factor(education)2" "A4_as.factor(education)3" [29] "A4_as.factor(education)4" "A4_as.factor(education)5" "A5_(Intercept)" "A5_as.factor(education)2" [33] "A5_as.factor(education)3" "A5_as.factor(education)4" "A5_as.factor(education)5"
Thanks a lot... How would you call that additional parameters, what is their type
?
Currently, we have the components conditional
and intercept
(maybe we change "conditional" to "correlation"), but how would we call the other additional parameters? Maybe these actually belong to the "conditional" model part?
> head(clean_parameters(m))
Parameter Effects Component Cleaned_Parameter
1 B1 fixed intercept B1
2 B2 fixed intercept B2
3 B3 fixed intercept B3
4 B4 fixed intercept B4
5 B1--B2 fixed conditional B1--B2
6 B1--B3 fixed conditional B1--B3
Hmm. So the conditional, in this case, would be a partial correlation so I think changing the name makes sense, but perhaps somehow emphasize they are partial correlations (partial_correlation
?)
Now the additional parameters are regression coefficients. I wonder what brms
calls them (I sometimes follow conventions of brms) ? Maybe coefficient
would work ?
https://github.com/donaldRwilliams/BGGM
Seems like a nice package that could be supported in bayestestR, parameters and see (and correlation?).
However, the model's class is simply "estimate" so I wonder whether that's specific enough.