easystats / insight

:crystal_ball: Easy access to model information for various model objects
https://easystats.github.io/insight/
GNU General Public License v3.0
392 stars 39 forks source link

Support for BGGM #199

Closed DominiqueMakowski closed 4 years ago

DominiqueMakowski commented 4 years ago

https://github.com/donaldRwilliams/BGGM

Seems like a nice package that could be supported in bayestestR, parameters and see (and correlation?).

However, the model's class is simply "estimate" so I wonder whether that's specific enough.

model <- BGGM::estimate(mtcars)
class(model)
"estimate"
donaldRwilliams commented 4 years ago

Note that version 2.0.0 is now on CRAN, with full support for tetrachoric, polychoric, rank based ("true" rank based, where the ranks are sampled from truncated normal distribution), and of course the continuous case. Also, now the samplers are written in c++.

Here are the options for estimating correlations

First note the classes have changed

fit <- estimate(mtcars)
class(fit)
"BGGM"     "estimate" "default" 

Tetrachoric

# binary data
Y <- BGGM::women_math[,1:3]

# fit
tetra <- zero_order_cors(Y, type = "binary")

# correlation mean
tetra$R_mean

# correlations in 3d array
tetra$R

Polychoric

Y <- BGGM::ptsd[,1:3]

polychor <- zero_order_cors(Y +1, type = "ordinal")

Rank based (Gaussian copula)

# ordinal
Y <- BGGM::ptsd[,1:3]

rank_gc <- zero_order_cors(Y, type = "mixed")

Note that there are no summary, print, etc. functions for correlations because the focus is on the GGM. Accordingly, it is possible to estimate the partial correlations and then transform after the fact. But there is plenty of documentation here that show examples of how to summarize, plot the credible intervals, etc.

# ordinal. "mixed" refers to mixed data, 
# but can be used with only ordinal.
Y <- BGGM::ptsd[,1:3]

# fit model
rank_gc <- estimate(Y, type = "mixed")

# pcor to cor
pcor_to_cor(rank_gc)

Note finally the Bayes factor testing methods are specifically for the partial correlations (see for example explore) and not sure the prior distribution would be ideal for testing correlations. For this, we plan to add the testing of correlations BFpack.

strengejacke commented 4 years ago

What would be the posterior "distribution"? $pcor_mat?

donaldRwilliams commented 4 years ago

That is the posterior mean. There is a function to access that directly.

Y <- BGGM::ptsd[,1:3]

rank_gc <- estimate(Y, type = "mixed")

pcor_mat(rank_gc)

Then if you want the posterior samples for the partial correlations

posterior_samples(rank_gc)

The posterior for the correlations is obtained with either pcor_to_cor or zero_order_cors

strengejacke commented 4 years ago

ok, insight::get_parameters() currently returns the same as pcor_mat(). Now, I'm not sure what get_parameters() should preferably return, pcor_mat() or posterior_samples()?

If I understand right, pcor_mat(model) gives the same correlation coefficients as colMeans(posterior_samples(model)). Following the logic of get_parameters() for Bayesian models, I'd say we return posterior_samples(model) - however, when used in correlation, @DominiqueMakowski has to fiddle around with the print-method... Dom, you must say what you imagine for bayestestR and correlation... For bayestestR, posterior_samples(model) makes more sense. See here:

library(BGGM)
data("mtcars")
model <- BGGM::estimate(mtcars)

# correlation table
pcor_mat(model)

# posterior mean from posterior samples of correlation coefficients
colMean(posterior_samples(model))

# posterior samples of correlation coefficients
head(posterior_samples(model))

(can't create a reprex, because BGGM requires .Random.seed to be present somewhere?!?)

donaldRwilliams commented 4 years ago

The one note about posterior_samples is that, for the case of binary and ordinal data, it also returns additional parameters. This is because "under the hood" a multivariate probit model is being estimated so returns the intercepts when formula = NULL

For the case of the partial correlation, the following would work

Y <- BGGM::ptsd[,1:3]

# notice the +1 (first category must be 1)
polychor <- estimate(Y+1, type = "ordinal")

samps <- posterior_samples(tetra)

samps[, !grepl( "(Intercept)" , colnames( samps ) )]
donaldRwilliams commented 4 years ago

Also to be clear, the following returns partial correlations. For correlations, see zero_order_cors which will return an array of the sampled correlation matrices.

samps <- posterior_samples(tetra)
strengejacke commented 4 years ago

what is tetra?

donaldRwilliams commented 4 years ago
# binary data
Y <- BGGM::women_math[,1:3]

# fit
tetra <- estimate(Y, type = "binary")

# partial correlations
samps <- posterior_samples(tetra)
DominiqueMakowski commented 4 years ago

ok, insight::get_parameters() currently returns the same as pcor_mat(). Now, I'm not sure what get_parameters() should preferably return, pcor_mat() or posterior_samples()?

@strengejacke Yes I agree, get_parameters should return the posterior distribution rather than postprocessed estimates. It's good also for how correlation works (since everything relies on insight)

The one note about posterior_samples is that, for the case of binary and ordinal data, it also returns additional parameters.

It's okay, as long as these additional parameters can be conceived as parameters (in the broader sense of estimates), get_parameters can return them as well, if necessary.

strengejacke commented 4 years ago

@donaldRwilliams what kind or type of additional parameters are returned? only intercepts?

strengejacke commented 4 years ago

Dom, is this ok?

> m <- estimate(ptsd[, 1:4] + 1, type = "ordinal", iter = 250)
BGGM: Posterior Sampling 
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
BGGM: Finished
> find_parameters(m)
$correlation
[1] "B1--B2" "B1--B3" "B2--B3" "B1--B4" "B2--B4" "B3--B4"

> 
> find_parameters(m, "correlation")
$correlation
[1] "B1--B2" "B1--B3" "B2--B3" "B1--B4" "B2--B4" "B3--B4"

> 
> find_parameters(m, "intercept")
$intercept
[1] "B1" "B2" "B3" "B4"

> 
> find_parameters(m, "all")
$intercept
[1] "B1" "B2" "B3" "B4"

$correlation
[1] "B1--B2" "B1--B3" "B2--B3" "B1--B4" "B2--B4" "B3--B4"

> 
> 
> head(get_parameters(m))
      B1--B2    B1--B3    B2--B3    B1--B4       B2--B4    B3--B4
1 0.06271854 0.2427961 0.5071544 0.3368148  0.206401239 0.1764215
2 0.17704425 0.1100967 0.5914172 0.4341394  0.037202144 0.3044216
3 0.21025450 0.1218805 0.5502681 0.2989079  0.024295643 0.4361668
4 0.25463391 0.2463020 0.4888627 0.3511248  0.005410656 0.3560969
5 0.27132060 0.1452279 0.5675196 0.3868580 -0.111311547 0.4403248
6 0.13040916 0.1787590 0.6628062 0.4669703 -0.203335744 0.4337148
> 
> head(get_parameters(m, "correlation"))
      B1--B2    B1--B3    B2--B3    B1--B4       B2--B4    B3--B4
1 0.06271854 0.2427961 0.5071544 0.3368148  0.206401239 0.1764215
2 0.17704425 0.1100967 0.5914172 0.4341394  0.037202144 0.3044216
3 0.21025450 0.1218805 0.5502681 0.2989079  0.024295643 0.4361668
4 0.25463391 0.2463020 0.4888627 0.3511248  0.005410656 0.3560969
5 0.27132060 0.1452279 0.5675196 0.3868580 -0.111311547 0.4403248
6 0.13040916 0.1787590 0.6628062 0.4669703 -0.203335744 0.4337148
> 
> head(get_parameters(m, "intercept"))
  B1_(Intercept) B2_(Intercept) B3_(Intercept) B4_(Intercept)
1      0.8898790      0.5065383      0.3587859      1.0772496
2      1.0284449      0.5003012      0.2768598      1.2143462
3      0.9359056      0.6566056      0.2096932      0.7070619
4      0.7775327      0.3991683      0.3692782      0.9101484
5      1.1894994      0.3041367      0.1150038      0.9993084
6      1.0227258      0.4969963      0.1857669      0.8361187
> 
> head(get_parameters(m, "all"))
      B1--B2    B1--B3    B2--B3    B1--B4       B2--B4    B3--B4 B1_(Intercept) B2_(Intercept)
1 0.06271854 0.2427961 0.5071544 0.3368148  0.206401239 0.1764215      0.8898790      0.5065383
2 0.17704425 0.1100967 0.5914172 0.4341394  0.037202144 0.3044216      1.0284449      0.5003012
3 0.21025450 0.1218805 0.5502681 0.2989079  0.024295643 0.4361668      0.9359056      0.6566056
4 0.25463391 0.2463020 0.4888627 0.3511248  0.005410656 0.3560969      0.7775327      0.3991683
5 0.27132060 0.1452279 0.5675196 0.3868580 -0.111311547 0.4403248      1.1894994      0.3041367
6 0.13040916 0.1787590 0.6628062 0.4669703 -0.203335744 0.4337148      1.0227258      0.4969963
  B3_(Intercept) B4_(Intercept)
1      0.3587859      1.0772496
2      0.2768598      1.2143462
3      0.2096932      0.7070619
4      0.3692782      0.9101484
5      0.1150038      0.9993084
6      0.1857669      0.8361187
> 
> head(clean_parameters(m))
  Parameter Effects   Component Cleaned_Parameter
1        B1   fixed   intercept                B1
2        B2   fixed   intercept                B2
3        B3   fixed   intercept                B3
4        B4   fixed   intercept                B4
5    B1--B2   fixed conditional            B1--B2
6    B1--B3   fixed conditional            B1--B3
DominiqueMakowski commented 4 years ago

looks good to me!

donaldRwilliams commented 4 years ago

@strengejacke "what kind or type of additional parameters are returned"

with formula = NULL only intercepts are returned.

strengejacke commented 4 years ago

Do you have a "more complex" example? So I can check that the implementation is sufficient? After that, I could close this issue.

donaldRwilliams commented 4 years ago

@strengejacke What kind of example do you have in mind ?

strengejacke commented 4 years ago

Something with formula involved that also returns additional parameters apart from Intercept.

donaldRwilliams commented 4 years ago

@strengejacke Apologies on the delay.

Here is an example, where partial correlations or correlations are estimated, controlling for education.

Y <- bfi[,c(1:5, 27)]

fit <- estimate(Y, formula = ~ as.factor(education))

samps <- posterior_samples(fit)

colnames(samps)

[1] "A1--A2"                   "A1--A3"                   "A2--A3"                   "A1--A4"                  
[5] "A2--A4"                   "A3--A4"                   "A1--A5"                   "A2--A5"                  
[9] "A3--A5"                   "A4--A5"                   "A1_(Intercept)"           "A1_as.factor(education)2"
[13] "A1_as.factor(education)3" "A1_as.factor(education)4" "A1_as.factor(education)5" "A2_(Intercept)"          
[17] "A2_as.factor(education)2" "A2_as.factor(education)3" "A2_as.factor(education)4" "A2_as.factor(education)5"
[21] "A3_(Intercept)"           "A3_as.factor(education)2" "A3_as.factor(education)3" "A3_as.factor(education)4"
[25] "A3_as.factor(education)5" "A4_(Intercept)"           "A4_as.factor(education)2" "A4_as.factor(education)3"
[29] "A4_as.factor(education)4" "A4_as.factor(education)5" "A5_(Intercept)"           "A5_as.factor(education)2"
[33] "A5_as.factor(education)3" "A5_as.factor(education)4" "A5_as.factor(education)5"
donaldRwilliams commented 4 years ago

Also, what is the best way to allow users to know BGGM is supported ? Perhaps add insight to Suggests ?

strengejacke commented 4 years ago

insight::is_model_supported() returns TRUE. Here you find BGGM listed. Once we also add tests, we would add BGGM as Suggests anyway, but not sure if/when we will add tests.

strengejacke commented 4 years ago

@strengejacke Apologies on the delay.

Here is an example, where partial correlations or correlations are estimated, controlling for education.

Y <- bfi[,c(1:5, 27)]

fit <- estimate(Y, formula = ~ as.factor(education))

samps <- posterior_samples(fit)

colnames(samps)

[1] "A1--A2"                   "A1--A3"                   "A2--A3"                   "A1--A4"                  
[5] "A2--A4"                   "A3--A4"                   "A1--A5"                   "A2--A5"                  
[9] "A3--A5"                   "A4--A5"                   "A1_(Intercept)"           "A1_as.factor(education)2"
[13] "A1_as.factor(education)3" "A1_as.factor(education)4" "A1_as.factor(education)5" "A2_(Intercept)"          
[17] "A2_as.factor(education)2" "A2_as.factor(education)3" "A2_as.factor(education)4" "A2_as.factor(education)5"
[21] "A3_(Intercept)"           "A3_as.factor(education)2" "A3_as.factor(education)3" "A3_as.factor(education)4"
[25] "A3_as.factor(education)5" "A4_(Intercept)"           "A4_as.factor(education)2" "A4_as.factor(education)3"
[29] "A4_as.factor(education)4" "A4_as.factor(education)5" "A5_(Intercept)"           "A5_as.factor(education)2"
[33] "A5_as.factor(education)3" "A5_as.factor(education)4" "A5_as.factor(education)5"

Thanks a lot... How would you call that additional parameters, what is their type?

Currently, we have the components conditional and intercept (maybe we change "conditional" to "correlation"), but how would we call the other additional parameters? Maybe these actually belong to the "conditional" model part?

> head(clean_parameters(m))
  Parameter Effects   Component Cleaned_Parameter
1        B1   fixed   intercept                B1
2        B2   fixed   intercept                B2
3        B3   fixed   intercept                B3
4        B4   fixed   intercept                B4
5    B1--B2   fixed conditional            B1--B2
6    B1--B3   fixed conditional            B1--B3
donaldRwilliams commented 4 years ago

Hmm. So the conditional, in this case, would be a partial correlation so I think changing the name makes sense, but perhaps somehow emphasize they are partial correlations (partial_correlation ?)

Now the additional parameters are regression coefficients. I wonder what brms calls them (I sometimes follow conventions of brms) ? Maybe coefficient would work ?