Closed wkopp closed 2 years ago
Hi @wkopp !
Here, the second derivative method is not meant to work with less than three points. In the approach we use (based on the central difference, inspired by https://stackoverflow.com/questions/4471993/compute-the-elbow-for-a-curve-automatically-and-mathematically), the first derivative measures the slope of the line between two points in the likelihood curve (the change between two points), the second measures the difference between two consecutive slopes (or the change of the change, the point with the maximum curvature); so you need at least two slopes (or three points). I have added an error message for this.
Thanks for reporting!
C
Hi,
would it be possible then to ignore the first and second derivative computation if there are too little numbers of topics? Because, in case the user already knows what number to chose, it requires to nevertheless compute other dummy topic numbers which requires time and resources.
Thank you.
Sure! Just use method='maximum', select='Your number of topics'. Nevertheless, for a proper topic selection I would recommend to run models in a bigger topic space.
Cheers,
C
I see. However, what is the purpose of having to specify method="maximum" if a specific topic number is selected anyways. In my opinion, it would be better not to have to specify this argument, because for a user this isn't intuitive and also it is not backward compatible, which would be nice and probably possible in this case.
Another issue with the second derivative computation seems to be that it is used by default with runCGSModels, right? However, from the documentation of the type parameter in the selectModel method, you recommend against using the derivative method with collapsed Gibbs sampling. So perhaps the selectModel method should be used in runCGSModels and runWarpLDAModels following the respective recommended model selection criteria.
Best, Wolfgang
Hi!
Sometimes the differences between models are small (in likelihood, or 2nd derivative), so we recommend to select the less complex model. If we only allow the automatic selection, it wouldn't be possible to manually change to other proper models.
selectModel is generally run after running the models (despite they are CGS or WarpLDA), we use derivative as default to agree with the latest WarpLDA estimation. I will add a warning if the models are CGS based.
Cheers!
C
Hi,
selectModel fails when fitting the model with runCGSModels using as the topic argument a single number (e.g.
topic=c(30)
).The error message that I get is
I also get an error when running runCGSModels with only two topic numbers (e.g.
topic=c(29,30)
), but then the error is different:When run the model for more then two topic numbers (e.g.
topic=c(29,30,31)
) it seems to work.Best, Wolfgang