AdvancedR-2021 / gghmm2

2 stars 1 forks source link

Starting parameters cause weird fit #1

Open maltenikolajsen opened 2 years ago

maltenikolajsen commented 2 years ago

Abusing the example in the Vignette by setting starting values close to each other causes the fitted values to be odd. Specifically, I got a non-stationary fit with transition matrix c(0,0,1,1). ` X <- earthquakes$n delta <- rep(.5, 2) trans <- matrix(rep(.5, 4),2) HM = HMM(stationary_dist = delta, transmision = trans, emission_function_names = c("dpois","dpois"), parameters = list(list(lambda=10),list(lambda=11)))

HM2 = em(HM,X) HM2 `

maltenikolajsen commented 2 years ago

Package 'gghmm2' feedback

Purpose of package

The purpose of the package is quite clear. As promised, it handles the modeling and analysis of hidden Markov models (HMMs). It does not necessarily fill a gap, nor does it complement already existing packages.

Completeness

The package contains the following functions in regard to the modeling and analysis of HMMs

The above allows one to fit with custom marginal distributions a hidden Markov model to some observed emissions. While the functions may not produce usual fitting statistics, they allow for a flexible fitting and posterior analysis. As far as I can tell, the description of the package does not mention tidyverse. Note that the class HMM is a subclass of tibble, an object from the aforementioned package.

Code quality and sophistication

The package did install without problems. Challenges to the package include setting the initial parameters of the fitting process close to one another or trying to fit with an odd marginal distribution. (Issues raised on Github) There are no tests in the package, as far as I can tell.

As mentioned, the HMM function returns S3 object HMM. The only method that this class has is print. So S3 objects are used, but not to their full potential. Overall, the package makes use of R's vector-oriented structure and avoids loops for the most part, which is a plus. Still, there are plenty of places where one could speed up the code. In the function viterbi, one could avoid calling do.call per loop by simply computing a large matrix of emissions before entering the for loop. Furthermore, numerical stability is not great in the functions forward or backward since $\beta_t = 0$ for small $t$ and $\alpha_t = 0$ for large $t$. (Read: Underflow.) Underflow is a problem since the function em relies on the forward and backward probabilities to fit. Consider having everything computed in terms of log in the EM-algorithm instead.

Documentation and data

For the uninitiated user, say one that has little to no knowledge of HMMs, the description of the functions may seem obscure. Consider adding some LaTeX specifying what exactly a HMM is. One with prior knowledge of HMM would have no problem understanding the descriptions of the functions. There is a Vignette in the package, which demonstrates how to fit a HMM to the included data eartheqakes.rda. The Vignette justifies the included functions in regards to modeling and analysis of the included data.

Conclusion and suggestions

Overall, the package is good in regards to fitting HMMs using the EM algorithm. The posterior analysis is quick and easy to use and provides great insights. There are places that could use improvements. Here are some suggestions