Open maltenikolajsen opened 2 years ago
The purpose of the package is quite clear. As promised, it handles the modeling and analysis of hidden Markov models (HMMs). It does not necessarily fill a gap, nor does it complement already existing packages.
The package contains the following functions in regard to the modeling and analysis of HMMs
HMM
: A class factory that produces S3 object HMM
.em
: A function that fits a HMM given observations.forecast
: A function that returns the probability of observing a specific value at a given time (out of sample) conditioned on a HMM and observations.local_decoder
: Calculates the most probable hidden state of the emission given a HMM.viterbi
: Finds the most likely sequence of hidden states conditioned on emissions and a HMM.state_prob
: Calculates the probability of being in a specific state at a time inside the sample conditioned on emissions and a HMM. The above allows one to fit with custom marginal distributions a hidden Markov model to some observed emissions.
While the functions may not produce usual fitting statistics, they allow for a flexible fitting and posterior analysis.
As far as I can tell, the description of the package does not mention tidyverse
.
Note that the class HMM
is a subclass of tibble
, an object from the aforementioned package.
The package did install without problems. Challenges to the package include setting the initial parameters of the fitting process close to one another or trying to fit with an odd marginal distribution. (Issues raised on Github) There are no tests in the package, as far as I can tell.
As mentioned, the HMM
function returns S3 object HMM
.
The only method that this class has is print
.
So S3 objects are used, but not to their full potential.
Overall, the package makes use of R
's vector-oriented structure and avoids loops for the most part, which is a plus.
Still, there are plenty of places where one could speed up the code.
In the function viterbi
, one could avoid calling do.call
per loop by simply computing a large matrix of emissions before entering the for loop.
Furthermore, numerical stability is not great in the functions forward
or backward
since $\beta_t = 0$ for small $t$ and $\alpha_t = 0$ for large $t$. (Read: Underflow.)
Underflow is a problem since the function em
relies on the forward and backward probabilities to fit.
Consider having everything computed in terms of log
in the EM-algorithm instead.
For the uninitiated user, say one that has little to no knowledge of HMMs, the description of the functions may seem obscure.
Consider adding some LaTeX specifying what exactly a HMM is.
One with prior knowledge of HMM would have no problem understanding the descriptions of the functions.
There is a Vignette in the package, which demonstrates how to fit a HMM to the included data eartheqakes.rda
.
The Vignette justifies the included functions in regards to modeling and analysis of the included data.
Overall, the package is good in regards to fitting HMMs using the EM algorithm. The posterior analysis is quick and easy to use and provides great insights. There are places that could use improvements. Here are some suggestions
HMM
.
Abusing the example in the Vignette by setting starting values close to each other causes the fitted values to be odd. Specifically, I got a non-stationary fit with transition matrix c(0,0,1,1). ` X <- earthquakes$n delta <- rep(.5, 2) trans <- matrix(rep(.5, 4),2) HM = HMM(stationary_dist = delta, transmision = trans, emission_function_names = c("dpois","dpois"), parameters = list(list(lambda=10),list(lambda=11)))
HM2 = em(HM,X) HM2 `