I was wondering if there are any more quantitative ways to determine the ideal number of clusters other than the qPlot elbow graph. I noticed in some cases picking the clear elbow at q = 6, for example, will result in clusters where cluster 6 is only a few scattered spots, indicating that one less cluster would be more natural. Could it be theoretically possible for me to use the raw NLL values from the qPlot and compute something like a Gap Statistic or other metric?
Sorry I wasn't able to respond sooner. Selecting # of clusters is in general a pretty hard problem in clustering. I think the gap statistic could work in theory but may be computationally too intensive.
Thanks again for developing this package
I was wondering if there are any more quantitative ways to determine the ideal number of clusters other than the qPlot elbow graph. I noticed in some cases picking the clear elbow at q = 6, for example, will result in clusters where cluster 6 is only a few scattered spots, indicating that one less cluster would be more natural. Could it be theoretically possible for me to use the raw NLL values from the qPlot and compute something like a Gap Statistic or other metric?