CecileProust-Lima / lcmm

R package lcmm
https://CecileProust-Lima.github.io/lcmm/
48 stars 13 forks source link

Entropy calculation #231

Open daavic opened 7 months ago

daavic commented 7 months ago

Thanks again for your very useful lcmm() package.

I just have a question about the entropy calculation.

In the CRAN document the formula is given as

1- sum[pi_iglog(pi_ig)]/(Nlog(G)) where pi_ig is the posterior probability that subject i belongs to class g

When I carry this out manually I get different results. Here is the code I am using - I have included negative signs inside the sum else I get crazy answers.

mod.results <- my.gmm$pprob 1 - (with(mod.results, sum(-prob1[class == 1]log(prob1[class == 1])) ) + with(mod.results, sum(-prob2[class == 2]log(prob2[class == 2]))) )/(dim(mod.results)[1]*log(2))

It's a 2 class model so G = 2.

Is the formula in the CRAN pdf correct?

Don

daavic commented 7 months ago

Can I ask, what are your thoughts about an entropy defined as an R squared type of entropy for use in GMM diagnostics/metrics?

It is used for latent class analysis.

It is defined as error_prior <- entropy(fit$P) # Class proportions error_post <- mean(apply(fit$posterior, 1, entropy)) R2_entropy <- (error_prior - error_post) / error_prior

where entropy is defined as sum(-p * log(p))

See: entropy R^2 statistic (Vermunt & Magidson, 2013, p. 71) daob.nl/wp-content/uploads/2015/07/ESRA-course-slides.pdf by Daniel Oberski

Thanks again,

Don

daavic commented 7 months ago

My apologies - the question is still open- closed by mistake.

daavic commented 7 months ago

I seem to be having difficulties with the permut() function.

xclass(my.gmm.c2, my.gmm.c3)

1 2 3

1 1 1 60

2 66 78 9

m2 <- permut(my.gmm.c2, order=c(2,1))

xclass(m2, my.gmm.c3)

1 2 3

1 1 1 60

2 66 78 9

I get the same table, the labels haven't been switched.

Also, I did this

m2 <- gmm.male.c2 m2b <- permut(m2, order=c(2,1))

m2b had a much reduced number of subjects and a table could not be drawn - I got the following error message - all arguments must have the same length.

What am I doing wrong?

Thanks again.

Don

VivianePhilipps commented 5 months ago

Hi,

sorry for the delay.

You are right for the sign, the entropy is 1+ sum[pi_ig*log(pi_ig)]/(Nlog(G)). I'll correct the github version. But in your formula, you use the posterior probabilities of belonging to class 1 only for subjects classified in class 1. But you should take this probabilities for all subjects. And the same for class 2.

As you notice there are several definitions of entropy. We choose to implement the one proposed by Ramaswamy, but other choices are possible.

For the permut function, I don't know what goes wrong. Could you run my.gmm.c2_permut <- permut(my.gmm.c2, order=c(2,1), estim = FALSE) to see if the parameters (my.gmm.c2_permut$best) are effectively recalculated (different from my.gmm.c2$best).

Viviane