Open hberger opened 1 year ago
Just an additional comment: I can fix the problem when using custom initialization values derived from k-means on the data:
km <- kmeans(d, 2)
mu_km = km$centers[, 1]
sd_km = tapply(d, km$cluster, sd)
mixmdl = normalmixEM(d, k=2, fast=T, mu = mu_km, sigma = sd_km)
This leads to stable prediction of the correct population parameters.
Hi, I have been trying to model an approximately gaussian mixture (image data, background + foreground intensities) using normalmixEM with 2 or 3 components.
While this seems to work generally, quite often I get results that just do not seem to make any sense in terms of the initial data distribution. In particular, the input distribution is clearly bimodal with a 30/70 split in proportions, but the resulting mixture does not fit the modes nor does the sum of the mixtures seem to fit the overall distribution well.
This problem happens for certain data sets for k=2 about 1 in 5 times and about 1 in 100-200 times for k=3. I can reproduce this behavior using specific random seeds for both k=2 and k=3. This does not seem to depend so much on the initial data as removing data points or subsampling from 100k to 1k data points does not change the behavior. Removing the long tail to the right also does not change anything. I tried to play around with initial lambda values but that does not seem to change much.
Tested in mixtools 1.2.0 on R 4.1.2 (openblas) and mixtools 2.0 on R 4.2.2 (standard R BLAS), both on Linux.
To reproduce (example data have been attached below):
Results from the 100k data set (1k looks similar) for k=2:
Output from a 'good' run (100k, k=2):
example_data.zip