I wouldn't normally make this issue for fear it feels nit-picky, but given that speed is a primary focus of this package I thought I would mention it to see if there is interest in me making this change.
Currently the state prediction step is:
trans = transition_matrix(hmm, control_seq[t])
αₜ, αₜ₊₁ = view(α, :, t), view(α, :, t + 1)
mul!(αₜ₊₁, transpose(trans), αₜ)
This means that this transposition is performed every time step leading to inefficient memory access at each filtering step. For the time-homogenous HMM it might make sense to cache a copy of the transposed matrix.
A rough implementation of this change led to a roughly 10–15% reduction on the package benchmarks for the forward algorithm on my laptop.
I wouldn't normally make this issue for fear it feels nit-picky, but given that speed is a primary focus of this package I thought I would mention it to see if there is interest in me making this change.
Currently the state prediction step is:
This means that this transposition is performed every time step leading to inefficient memory access at each filtering step. For the time-homogenous HMM it might make sense to cache a copy of the transposed matrix.
A rough implementation of this change led to a roughly 10–15% reduction on the package benchmarks for the forward algorithm on my laptop.