Reproducible Monte Carlo problem

mattborghi commented 4 years ago

Hello everyone,

I wanted to know if my current approach at making a reproducible Monte Carlo problem is correct.

First we take a seed as input value given by the user and the we generate n_paths new seeds, each one for creating a new MC path later on.

using StochasticDiffEq, DiffEqNoiseProcess
using RandomNumbers

# First take a user input seed
seed = UInt(3)
n_paths = 100
# Here we are defining the generator of seeds
rng = Xorshifts.Xoroshiro128Plus(seed)
# We are generating N_paths seeds to feed a RNG that each will generate a path
seed_values = Xorshifts.rand(rng, Int64, n_paths)

Then we define our problem (Just took the scalar example and added a noise process).

α = 1
β = 1
u₀ = 1 / 2
f(u, p, t) = α * u
g(u, p, t) = β * u
rng = Xorshifts.Xoroshiro128Plus(seed_values[1]) # to be overwritten...
noise = WienerProcess(0.0, 0.0, rng = rng, reseed = false) # Specify the noise
dt = 1 // 2^(4)
T = 1.0
tspan = (0.0, T)
prob = SDEProblem(f, g, u₀, tspan, noise = noise)

Finally, the most important part: in our EnsembleProblem we define a funcion where we are going to modify it by creating a new RNG for each path using the seed in the first step.

function prob_func(prob, i, repeat)
    prob.noise.rng = Xorshifts.Xoroshiro128Plus(seed_values[i])
    prob
end

ensembleprob = EnsembleProblem(prob, prob_func = prob_func)

sol = solve(ensembleprob, EM(), trajectories = n_paths, dt = dt)

So, to sum up, do you consider this a correct solving approach?, because it's working but I don't know if it is the right way to solve this problem.

Thanks in advance.

ChrisRackauckas commented 4 years ago

I think that's over complicated. I'd just do:

function prob_func(prob, i, repeat)
    remake(prob,seed=seed_values[i])
end

rvignolo commented 4 years ago

Is this the only way an EnsambleProblem for Stochastic simulations can be reproducible? I mean, using this method, DifferentialEquations is instantiating trajectories random number generators and, usually, what one would want is to get all the random numbers from the same random number generator, irrespective of the number of trajectories.

I also don't know if using different random number generators is correct! Maybe someone can provide some insight 😄.

Thanks!

ChrisRackauckas commented 4 years ago

It by default uses a localized rng with a random seed. The localization is for thread-safety, since a lot of uses are using multithreading which can make things... weird.

rvignolo commented 4 years ago

The localization is for thread-safety, since a lot of uses are using multithreading which can make things... weird.

Okay, that makes sense! I mean, the localization makes sense.

It by default uses a localized rng with a random seed.

Why does it use random seeds? This makes the trajectories unreproducible between simulations. Also, I believe that using a seed in an EnsambleProblem makes all the trajectories equal since it uses the same seed for all rng objects.

The approach proposed by @mattborghi seems better: if the user provides a seed, there are several seeds generated (one for each trajectory) and the simulation is reproducible and thread-safe. Maybe we can think which is the best way to generate all the seeds given one seed: a rng object is one approach, another one would be to use an equation.

Thanks Chris.

ChrisRackauckas commented 4 years ago

By default it uses random seeds, but if you pass seed=1, it'll use random seed 1. It's already syntactic sugar over what was shown here.

rvignolo commented 4 years ago

What do you mean by passing seed = 1? Do you mean the following?

function prob_func(prob, i, repeat)
    remake(prob,seed=1)
end

On the other hand, did you mean seed=0? I am asking because of: https://github.com/SciML/StochasticDiffEq.jl/blob/d1917f4524e41f25647fc5e942f7c6a1eddf4418/src/solve.jl#L246

If this is what you mean, this is not the same as what was shown here by @mattborghi because in this last case the paths won't be the same when comparing two different simulations. They will always be different.

Lastly, maybe you mean that using seed=1 in an EnsambleProblem makes different trajectories but the same trajectories between two simulations. This is not my experience.

I am sorry to be tedious about this, but I just want to understand this. Thank you!

ChrisRackauckas commented 4 years ago

Making the seed non-zero makes the Brownian process the same (it sets the seed). Seed=0, the default, uses a random seed.

drrobots commented 4 years ago

Making the seed non-zero makes the Brownian process the same (it sets the seed). Seed=0, the default, uses a random seed.

I had the seed set to zero in my code and could not figure out why I had no reproducibility. Is this mentioned somewhere in the documentation?

ChrisRackauckas commented 4 years ago

I would've thought it was, but now that I search the docs it isn't. This should really get documented. Thanks for bringing that up.

SciML / StochasticDiffEq.jl

Reproducible Monte Carlo problem #276