gama-platform / gama

Main repository for developing the 2024+ versions of GAMA
https://gama-platform.org
GNU General Public License v3.0
31 stars 6 forks source link

[batch] experiment doesn't kill previous simulations when instanciating new ones in some models #282

Open lesquoyb opened 3 months ago

lesquoyb commented 3 months ago

Describe the bug I'm running the same simulation multiple times using the batch experiment, and I noticed that in some cases, old simulations won't be killed even after the until facet condition is met and a new batch of simulations are created and running. Here is an example model:


model batchtest

global {
    float start <- gama.machine_time;
    float seed <- 3.1415;

    reflex d {
        int size <- 100;
        let m <- matrix_with({size,size}, rnd(0,255));
        m <- shuffle(m);
        loop i from:0 to:size-1 {
            loop j from:0 to:size-1 {
                m[i,j] <- rnd(255);
            }
        }
    }

}

experiment b type:batch until:cycle>=1000 repeat:100 parallel:4{

    reflex s {
        save gama.machine_time - start to:"logs/" + int(simulation) +"/simulationDuration.txt";
    }

}

And here is what happens when I run it:

https://github.com/user-attachments/assets/3c39edc0-22a1-40a3-9234-e7c282d34b31

Maybe it's related to #198 too ?

I also noticed that the average simulation step doesn't change which is weird as obviously it should increase as more and more simulations are running in parallel

You will also notice that the reflex in the experiment is never called, even once the batch is over.

Expected behavior Old simulations are killed when a new batch is created and run, and experiment reflexes are triggered as they used to.

Desktop (please complete the following information):

chapuisk commented 3 months ago

You should be aware of the keep_simulations facet, which is true by default in Gama batch. It means that every simulation is kept in memory until the end of the experiment. Unfortunately, there is something wrong with pausing a simulation and not the experiment: from the doc of the pause method of the SimulationAgent class, we can read "Allows to pause the current simulation ACTUALLY EXPERIMENT FOR THE MOMENT. It can be resumed with the manual intervention of the user or the 'resume' action." EDIT: the actual workaround was to unschedule simulations, that is, remove them from the mapping sim::thread that are actually called to step simulations.

chapuisk commented 3 months ago

I think that we should step back and think about what batch experiments are made for? more specifically, permanent displays and related features, build on the fact that simulation are all kept in memory, should be discussed and either refactored (e.g. use of serialization) or removed (make batch and headless as close as possible). @AlexisDrogoul and @ptaillandier any opinion on that matter?

lesquoyb commented 3 months ago

You were right about the keep_simulations thing, it was true by default, and setting it to false fixed some of the problems I listed.

However it's supposed to be false by default as per the documentation and what I've seen from the code base, do you have any idea why mine was true by default ? If I run in the same workspace the batch experiment in the model Model 13 of prey predator tutorial, the simulations are not kept either and there's no value explicitly set for keep_simulations so in that case it's false by default.

Even without this, there are still two big issues here:

  1. The simulations do not stop. I guess that's what you explained in your first comment but I must admit that I don't understand most of what you said. In addition I do not have this problem on other batch experiments on the same computer.
  2. The reflex of the experiment is never called (or maybe fails silently ?)
chapuisk commented 3 months ago

Ok ! another parameter may explain the second behavior: pref_parallel_simulations_all. When this one is true (default false) no reflex will be executed in the experiment. When this preference is set to false, experiment reflexes should trigger once a set of repeat simulations is done (100 in your case). About the first one, it is rather strange and might be link to SimulationRunner that do not unscheduled simulation correctly

lesquoyb commented 2 months ago

I don't have this parameter set to true, but maybe I should open a different issue for the reflex of the experiment not triggered.

For the initial issue, I noticed that this problem doesn't appear all the time and it depends on models and computers

lesquoyb commented 2 months ago

Ok I understood why the experiment only had one step: I am actually only exploring 1 set of parameters and just repeating it a big number of time. I don't know if it's the right way to do it but at least it makes sense, maybe that's something we can discuss together when we rethink the batch

lesquoyb commented 2 months ago

As there has been some discussion about this I think it's better if I summarize here what is the problem:

When running the example model, the batch experiment "realizes" that the until condition is met and initializes 4 new simulations, but the 4 first simulations are not killed, behaving like keep_simulations is set to true, which is not the case by default (an example is experiment explore_model in Model6 of the Luneray flu tutorials). Adding an explicit keep_simulations:false solves the issue.