Avoid repeated transposition when using time-homogenous transition matrix

I wouldn't normally make this issue for fear it feels nit-picky, but given that speed is a primary focus of this package I thought I would mention it to see if there is interest in me making this change.

Currently the state prediction step is:

trans = transition_matrix(hmm, control_seq[t])
αₜ, αₜ₊₁ = view(α, :, t), view(α, :, t + 1)
mul!(αₜ₊₁, transpose(trans), αₜ)

This means that this transposition is performed every time step leading to inefficient memory access at each filtering step. For the time-homogenous HMM it might make sense to cache a copy of the transposed matrix.

A rough implementation of this change led to a roughly 10–15% reduction on the package benchmarks for the forward algorithm on my laptop.

gdalle / HiddenMarkovModels.jl

Avoid repeated transposition when using time-homogenous transition matrix #106