SED-ML / sed-ml

Simulation Experiment Description Markup Language (SED-ML)
http://sed-ml.org
5 stars 2 forks source link

Define what to do with the simulator rng seed during a SED-ML experiment #30

Closed luciansmith closed 3 years ago

luciansmith commented 7 years ago

Chris Myers was under the impression that setting 'reset=true' on a repeated task would require resetting the random number seed for the simulation. I would have thought that the random number seed would never be reset at any point. Clearly, we need to state explicitly in the spec what to do in this situation, and for the course of a SED-ML experiment in general.

If it turns out that people might actually want to reset the random number seed at some point during a SED-ML experiment, we then need a way to let people do that, perhaps through the use of a KiSAO term, or with a SED-ML defined csymbol (like time).

matthiaskoenig commented 7 years ago

This is defined. One can just set the SEED as AlgorithmParameter. I even implemented a phrasedml example for that

See here https://tellurium.readthedocs.io/en/latest/notebooks.html#repeatedstochastic

luciansmith commented 7 years ago

Actually, that illustrates my point: according to Chris's thought, since the seed was an attribute of timecourse1, you would set the seed to 1003 every time you repeated the simulation!

This is, of course, normally absurd, since it would mean that you would get the exact same simulation for every repeat. But it appears we need to state this explicitly.

luciansmith commented 7 years ago

Wait, no, you actually did reset the seed every time, and got the same result because of this. Hmm.

matthiaskoenig commented 7 years ago

It shows both. Once with setting seed. All stochastic runs are the same. Once without setting seed, all runs different.

luciansmith commented 7 years ago

So, it looks like there's no way to set the seed once for the SED-ML experiment as a whole?

matthiaskoenig commented 7 years ago

[1]

So, it looks like there's no way to set the seed once for the SED-ML experiment as a whole?

Yes, there is nothing like that. We would need something like a global listOfAlgorithmParameters() which just defines once which algorithmParameters are applied before anything is calculated. The individual simulations could than overwrite the globalParameters. This would be a nice addition, because often a single simulation experiment runs all the simulations with very similar AlgorithmParameters and would remove redundancy.

An important note: That a global seed guarantees reproducibility with a given software and version requires that there is a clearly defined number of random draws from a distribution before the specific simulation is executed, i.e. the order of draws for simulations must be deterministic. Because tasks are allowed to execute in RANDOM order, i.e. there is no guarantee in which order they are exectuted (this is even stated in the spec) could be an issue for reproducibility of the total simulation experiment. We have to add a sentence to the task order. Something in the direction of

"Tasks can be executed in any order by a software implementation, but the order must be reproducible. I.e. if the same simulation experiment is run twice the order of execution must be identical."

An important note that we have to make in the spec is: "Setting a seed only guarantees the reproducibility of the (stochastic) simulation experiment with a given software (and software version), i.e. running the simulation twice with the same software setup. It does not mean that two software implementations (or even different versions of a given software) will produce the same numerical results."

[2] We should add some clarification what reset means exactly to the spec. Currently it reads:

The \element{repeatedTask} class has a required attribute \element{resetModel} of type \code{boolean}. It specifies whether the model should be reset to the initial state before processing an iteration of the defined \hyperref[class:subTask]{subTasks}. Here initial state refers to the state of the model as given in the \element{listOfModels}. In the example in Listing~\ref{lst:repeatedTask} the repeated task is not to be reset, so a change is made, \code{task1} is carried out, another change is made, then \code{task1} continues from there, another change is applied, and \code{task1} is carried out a last time.

An additional sentence would be helpful, like: "Reseting a model includes reseting of all changes applied to the model including among others reseting the initial values, reseting parameter values. A reseted model is a model in the state after loading the model. A reset only affects the model, but does not affect AlgorithmParameters like for instance the seed."

matthiaskoenig commented 7 years ago

Probably we cannot (and should not) guarantee the reproducibility of stochastic simulations over a global SED-ML file via a SEED. I.e. there will be no way to get he same numerical results for all runs. But one will get the same mean timecourse and the moments of the stochastic timecourse will be reproducible over many runs, but not the individual runs.

Personally, I want to use SED-ML for large-scale simulations on clusters, In my opinion it should be possible via the task structure in SED-ML to find good patterns for parallelization and put different parts of the simulations on different cores or even servers (mainly by analyzing the dependency graph of the tasks and simulations, and the reset structure).

In such a scenario it will not be possible to set the SEED globally for all these machines/cores (based on the single SEED set globally in a SED-ML file). Depending on the distribution of the simulations on different machines (which can be different every time you run a SED-ML file, due to server availability and resources) you will get different results. You don't want to set the same seed on the different machines, because they will run identical simulations for different parts of a task (for instance if you put the first half of the repeated task on one machine, the second on another machine you cannot run them with the same SEED).

So in my opinion it is not a good idea to have something like a global SEED and force complete reproducibility of stochastic runs, because it will restrict SED-ML to be used on a single core (single process where you can set the SEED) on a single machine for the whole simulation. I would love to see SED-ML run on large clusters (the global SEED would generate a lot of issues in this context).

luciansmith commented 3 years ago

To revisit this: is everyone still of the opinion that the seed of an algorithm should be reset every time in a repeated task?

This is not how tellurium implements support for the seed parameter, and I don't think I'd change it to be something obviously useless; I'd be happier simply being considered non-compliant on this issue.

I agree that it would make the most sense to put the seed in some sort of global list of algorithm parameters, but barring that, some sort of 'don't reset the seed' exception seems like it would be useful on a pragmatic level.

matthiaskoenig commented 3 years ago

To revisit this: is everyone still of the opinion that the seed of an algorithm should be reset every time in a repeated task?

As far as I understand the specification there is nothing about setting AlgorithmParameters in RepeatedTasks specified. I interpret the current specification as:

There is an option to set algorithmParameters in a repeatedTask. So you could define a range with seed values as part of the repeatedTask and set these seeds during the iteration. Thereby you would have the behavior you want. I never saw an example for this, but it seems to be possible.

So yes the behavior is not what you want :/

fbergmann commented 3 years ago

To revisit this: is everyone still of the opinion that the seed of an algorithm should be reset every time in a repeated task?

I don't think the seed should be reset every time! If there are stochastic effects then, maybe i could see setting it once before running the whole set of tasks at the beginning. But ideally, i would not touch the seed at all. I would want different stochastic traces when i simulate many times, not the same ones over and over again.

matthiaskoenig commented 3 years ago

@fbergmann But for this behavior we would need global algorithm parameters as proposed in https://github.com/SED-ML/sed-ml/issues/73 Currently, to get different stochastic trajectories one would just not set a seed, but of course this gives different results every time one runs the experiment.

luciansmith commented 3 years ago

In theory, you are right, but this is completely useless behavior that nobody should implement.

Any simulator author should know that if 'seed' is set, that means they want the entire SED-ML file to be repeatable, not that they want to repeat individual repeated tasks. There is no use for this.

I agree that we should fix the semantics for L2, but for now, people should expect that simulators will not reset the seed every time.

matthiaskoenig commented 3 years ago

@luciansmith I agree. If we are all on the same boat here we should just add a clarification sentence to the specification along the line:

All AlgorithmParameters must be set before the start of every simulation to the values provided in the ListOfAlgorithmParameters. Only exception is the seed setting the random generator seed for a simulation which should only be set once before the first simulation is run which defines the seed.

luciansmith commented 3 years ago

So, from what I can tell, I think we decided to both add a global list of algorithm parameters and to call out the seed as potentially meaning something unique. As such, I have added a new section 2.2.1.11:

"The listOfAlgorithmParameters container holds the AlgorithmParameter objects that apply globally. This can include parameters like a seed (KISAO:0000488) that apply to the simulation experiment as a whole, as well as algorithm parameters that might apply to all tasks of a particular type, such as the absolute tolerance (KISAO:0000211). If an AlgorithmParameter is defined for a particular Simulation, it will take precedent over any global AlgorithmParameter with the same KiSAO ID. The listOfAlgorithmParameters is optional and may contain zero or more parameters."

Then in section 2.2.7.2 (AlgorithmParameter):

"NOTE: the global ListOfAlgorithmParameters was added to SED-ML in Level 1 Version 4. As such, the only place to define a random seed (KISAO:0000211) for the simulation experiment as a whole in previous versions was in a Simulation, which might be part of a RepeatedTask. Rather than indicating that each repeat was to receive the same seed, resulting in identical traces, users would generally use the ’seed’ parameter to indicate that the experiment as a whole was to be repeatable from one run to the next. Current users of SED-ML should use a global AlgorithmParameter for this purpose, but older versions or older files may be using the older scheme."

And finally in the adjusted RepeatedTask description of what to do when:

"The order of activities within each iteration of a RepeatedTask is as follows:

This is potentially controversial enough and long enough that people may have comments, so I won't close this issue yet, but the change represents how I think we should adjust the spec given this discussion.