Closed orijitghosh closed 7 months ago
Hi, it looks like the problem is that the distribution of the data is very different from being normal (or a mixture of normal distributions).
As you pointed out, there are many observations that are equal to 0.001, but beyond this, the variable normact
only have 8 unique values in the data set. I noticed that almost all observations are a multiple of 0.2994012 (except the observations equal to 0.001). For this reason, the data look a little bit like a time series of integers, so one thing you could do is scale the observations to be {0, 1, 2, ..., 9}, and then fit a Poisson HMM with zero inflation. This would treat the observations as discrete (as opposed to normal of gamma HMMs), but ordered (as opposed to a categorical HMM). The zero inflation will account for the high proportion of zeros. You will have to decide whether this makes sense or not for your application.
# Transform observations to integers
data$normact <- round(data$normact / 0.2994012)
hid <- MarkovChain$new(data = data, n_states = 2)
# Define zero-inflated Poisson model for observations
dists <- list(normact = "zipois")
par0 <- list(normact = list(rate = c(0.5, 1.5), z = c(0.9, 0.1)))
obs <- Observation$new(data = data, dists = dists,
n_states = 2, par = par0)
hmm <- HMM$new(obs = obs, hid = hid)
hmm$fit()
It looks like this identifies one state that captures almost all the zeros (and has high zero inflation parameter z
), and an other state that captures larger observations.
> lapply(hmm$par(), round, 3)
$obspar
, , 1
state 1 state 2
normact.rate 0.790 1.215
normact.z 0.993 0.089
$tpm
, , 1
state 1 state 2
state 1 0.993 0.007
state 2 0.026 0.974
I noticed that you tried to use four states in your code. The more states you use, the more unstable the model fitting will be. I would generally recommend starting from a simple 2-state model, and building complexity from there if possible. But there might not be clear enough clusters in the data to identify four states here.
Hi Theo,
Thank you for the detailed reply. This looks like a problem inherent to the data. Unfortunately, for this example, I took an individual which had really less activity throughout the day (>90% of timepoints are zero), so when the activity counts are normalized, the values look multiples of 0.2994012. I then replaced the zeroes with small values. Around 5% of my data will be individuals like this. For most of my data, I used weibull
distribution with 4 states for, and they do work, and inferred states make sense. I understand the solution you are suggesting. Maybe for this edge cases, a zero-inflated poisson model will be better.
Thank you!
Hi, I was trying to implement
hmmTMB
on my data (1440 timepoints for each minutes of the day) which has large number of zeroes or very small values (1e-03). By large numbers, I mean sometimes almost 80% of the timepoints. However, every time I have been getting this error:Data:
Code:
Any idea what may be causing this?
Thank you for the nice documentation and examples!