Off-By-One Errors - Githubissues

THargreaves commented 2 months ago

I just noticed that we have an off-by-one error in the interface which hasn't been noticed yet since we've only implemented homogenous models.

Currently, simulate(dyn, step) is defined to transition to the next time step and therefore at time i we must run it with step=i-1. On the other hand simulate(obs, step) would be ran with step=i. I didn't notice this distinction when writing the Kalman filter example leading to incorrect code in the non-homogeneous case.

It would be much more consistent to have simulate(dyn, ...) be defined as the transition to the current step from the previous.

The downside of this is that it would require us to start with step=2 and never call the function with step=1. I feel like that's asking for trouble. One way around this would be to separate the initialisation of the SSM from the rest of the transition/observe loop as in the following diagram.

I believe this is a less common structure for a state space model but I have seen it been done before (in fact, this is what Kalman.jl does). If "unobserved" initialisation is not desired, one could simply but in a dummy initialiser, or have the first transition be the identity.

I'm happy to make this change but thought it was worth getting the opinions of @FredericWantiez, @charlesknipp first.

THargreaves commented 2 months ago

As a side-note. This will also considerably clean up the filtering code as we don't have to do a strange

if t==1
    # initialise
elseif t>1
    # step
end

Instead it becomes a simple

# initialise
for t = 1:T
    # step
end

charlesknipp commented 2 months ago

This is definitely the sensible approach. I employ something similar in my filtering code with the prior function to initialize the filtered states.

FredericWantiez commented 2 months ago

I thought we already had a simulate(dyn::LatentDynamics, extra) for the first step: https://github.com/TuringLang/SSMProblems.jl/blob/main/src/SSMProblems.jl#L105C1-L107C4

THargreaves commented 2 months ago

I should have explained myself a bit clearer. The current process is

# t = 1
simulate(dyn::LatentDynamics, extra)
simulate(obs::ObservationProcess, 1, extra)
# t = 2
simulate(dyn::LatentDynamics, 1, state, extra)
simulate(obs::ObservationProcess, 2, state, extra)
# t = 3
simulate(dyn::LatentDynamics, 2, state, extra)
simulate(obs::ObservationProcess, 3, state, extra)

Note how the time steps are different for latent dynamics and observations. By redefining simulate for latent dynamics to be the transition from prev -> current rather than current -> next as it is now we get

# t = 1
simulate(dyn::LatentDynamics, extra)
simulate(obs::ObservationProcess, 1, extra)
# t = 2
simulate(dyn::LatentDynamics, 2, state, extra)
simulate(obs::ObservationProcess, 2, state, extra)
# t = 3
simulate(dyn::LatentDynamics, 3, state, extra)
simulate(obs::ObservationProcess, 3, state, extra)

This is still a bit clunky because simulate(dyn::LatentDynamics, 1, state, extra) (with time step 1) is never called. Instead I suggest we have

# t = 0
simulate(dyn::LatentDynamics, extra)
# t = 1
simulate(dyn::LatentDynamics, 1, state, extra)
simulate(obs::ObservationProcess, 1, state, extra)
# t = 2
simulate(dyn::LatentDynamics, 2, state, extra)
simulate(obs::ObservationProcess, 2, state, extra)

i.e. have X_0 as its own non-observed node as in the SSM diagram.

In fact, we were using this exact diagram in the docs even though our method definitions ignored the existence of X_0.

yebai commented 2 months ago

t = 1 simulate(dyn::LatentDynamics, extra) simulate(obs::ObservationProcess, 1, extra)

Why do we want to simulate observation for the initialisation state x_0?

# t = 0
simulate(dyn::LatentDynamics, extra)
# t = 1
simulate(dyn::LatentDynamics, 1, state, extra)
simulate(obs::ObservationProcess, 1, state, extra)
# t = 2
simulate(dyn::LatentDynamics, 2, state, extra)
simulate(obs::ObservationProcess, 2, state, extra)

This looks more natural to me than alternatives.

THargreaves commented 2 months ago

Very much agree that this looks more natural. This does lead to the issue you mention here regarding how we pass in the extras if we want to include those for the initialisation too.

This approach gives us extras for time steps 0, 1, ..., T so our options are:

Use a zero-indexed array
Use a dictionary (probably quite slow)
Split into initial_extra, extras
Use an array of length T+1 and have all the extras be offset (very confusing in my opinion)

yebai commented 2 months ago

Split into initial_extra, extras

It sounds reasonable to have extra0, and x0 and handle them separately from extras and xs.

TuringLang / SSMProblems.jl

Off-By-One Errors #54