Different results when changing nb_steps

maurosilber commented 2 months ago

Hi,

I get different results when running the example from the README with differing number of steps (and a fixed seed).

import rebop
from matplotlib.cm import viridis as cmap

sir = rebop.Gillespie()
sir.add_reaction(1e-4, ["S", "I"], ["I", "I"])
sir.add_reaction(0.01, ["I"], ["R"])

values = {"S": 999, "I": 1}
tmax = 250
seed = 1

ax = None
steps = [0, 10, 50, 100, 500, 1000]
for step, color in zip(steps, cmap.resampled(len(steps)).colors):
    ax = (
        sir.run(values, tmax=tmax, nb_steps=step, seed=seed)
        .to_dataframe()
        .sort_index(axis="columns")
        .plot(ax=ax, color=color, subplots=True, legend=False)
    )
ax[1].legend(steps, title="Steps", bbox_to_anchor=(1, 0.5), loc="center left")

I assumed that nb_steps would save a subset of the solution by interpolating at equispaced points.

I think the issue could be in using this method: https://github.com/Armavica/rebop/blob/5ecd326afec91846c28320fcce208645eb38c804/src/lib.rs#L347 which can advance the simulation time without performing any reaction: https://github.com/Armavica/rebop/blob/5ecd326afec91846c28320fcce208645eb38c804/src/gillespie.rs#L278-L281 after which the simulation is restarted from that intermediate tmax and does a different time jump, instead of doing the original jump that would have taken the simulation to a self.t > tmax.

Does this process still produce a different but correct sample of the master equation?

Thanks!

maurosilber commented 1 month ago

It seems to produce the same distribution (at least for the SIR model).

I performed 1000 simulations of the SIR using:

nb_steps=250 (blue)
nb_steps=0 (orange), which was later interpolated at the same time points of 1.

and compared the distributions at the same time points:

where each curve corresponds to a different quantile.

Here is the script to produce the figure:

import numpy as np
import rebop
import xarray

sir = rebop.Gillespie()
sir.add_reaction(1e-4, ["S", "I"], ["I", "I"])
sir.add_reaction(0.01, ["I"], ["R"])

values = {"S": 999, "I": 1}
tmax = 250

quantiles = np.linspace(0, 1, 11)

seeds = range(1000)
x_steps = (
    xarray.merge(
        [
            sir.run(values, tmax=tmax, nb_steps=250, seed=seed).to_dataarray(
                "species", name=seed
            )
            for seed in seeds
        ]
    )
    .to_dataarray("seed")
    .quantile(quantiles, dim="seed")
)
x_interpolated = (
    xarray.merge(
        [
            sir.run(values, tmax=tmax, nb_steps=0, seed=seed)
            .interp(time=x_steps.time)
            .to_dataarray("species", name=seed)
            for seed in seeds
        ]
    )
    .to_dataarray("seed")
    .quantile(quantiles, dim="seed")
)

ax = x_steps.plot.line(x="time", col="species", add_legend=False, color="C0")
for ax, s in zip(ax.axes.flat, ax.name_dicts.flat):
    x_interpolated.sel(s).plot.line(
        x="time", ax=ax, add_legend=False, color="C1", linestyle="--"
    )

Armavica commented 1 month ago

Hi, thank you for taking the time to report this!

I was aware of this behavior, it is a consequence of the way I implemented the algorithm. But all samples returned are mathematically correct traces, as if you were using different seeds.

This is something that I have been wanting to change though, because it breaks the principle of least surprise. I will rewrite this part for the next release, to make the trace independent on nb_steps.

Armavica / rebop

Different results when changing nb_steps #26