gama-platform / gama

Main repository for developing the 2024+ versions of GAMA
https://gama-platform.org
GNU General Public License v3.0
16 stars 5 forks source link

[batch][headless][server] NPE once first simulation is over #297

Open lesquoyb opened 2 weeks ago

lesquoyb commented 2 weeks ago

Describe the bug When running a batch experiment through headless server I have a NullPointerException at the end of the first simulation of the batch:

java.lang.NullPointerException: Cannot invoke "gama.core.util.IList.get(int)" because "c" is null
    at gama.core.kernel.experiment.ExperimentAgent.createSimulation(ExperimentAgent.java:501)
    at gama.core.kernel.experiment.BatchAgent.createSimulation(BatchAgent.java:274)
    at gama.core.kernel.experiment.BatchAgent.launchSimulationsWithSolution(BatchAgent.java:327)
    at gama.core.kernel.batch.exploration.Exploration.explore(Exploration.java:198)
    at gama.core.kernel.batch.exploration.AExplorationAlgorithm.run(AExplorationAlgorithm.java:125)
    at gama.core.kernel.experiment.BatchAgent.step(BatchAgent.java:233)
    at gama.headless.core.Experiment.step(Experiment.java:144)
    at gama.headless.server.GamaServerExperimentJob.doStep(GamaServerExperimentJob.java:95)
    at gama.headless.server.GamaServerExperimentController.step(GamaServerExperimentController.java:260)
    at gama.headless.server.GamaServerExperimentController$MyRunnable.run(GamaServerExperimentController.java:93)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1570)

And the whole experiment stops. The same experiment has no problem running in GUI mode.

To Reproduce Steps to reproduce the behavior:

  1. Start gama headless server
  2. run this model:
    
    model batchtest

global { bool end <- false;

reflex e when:cycle=100{
    end <-true;
}

}

experiment simulationBase type: batch until:end keep_simulations:true repeat:100 parallel:1 { }


3. See error

**Expected behavior**
The experiment doesn't crash after the end of the first simulation
lesquoyb commented 2 weeks ago

I've found what generates the NPE: the flowstatus of the scope passed to createAgent at line 500 in createSimulation of ExperimentAgent (see here is DISPOSE, so in the implementation (see here) there's a test and if the scope is interrupted (which is the case if the flowStatus is DISPOSE) the whole method returns null, which creates our NPE later.

@chapuisk you recently reworked a bit the Exploration class that is involved (see the error stack) maybe you can have a look ?

chapuisk commented 2 weeks ago

@lesquoyb What I can see first is a contradiction between your Gama preferences and what you try to do with your experiment: seems like pref_parallel_simulations_all is set to true, while you limit the number of parallel thread to one... however, I'll try to investigate more to avoid crash in that case.

Edit: don't tell me the pref is false, because your stacktrace go through it (here)

lesquoyb commented 2 weeks ago

well maybe the problem is in the fetching of the parameters then because I have the default ones for parallelism: image pref_parallel_simulations_allis false but pref_parallel_simulationsis true, maybe there's a confusion here

chapuisk commented 2 weeks ago

Ok ! can you print the pref within your experiment? are we sure that server workspace is the exact same as the GUI one? do you have the same error when parallel facet is more than 1?

chapuisk commented 1 week ago

@lesquoyb can you explain further how you practically launch the experiment with the use of Gama server? so I can reproduce the issue and see what I can do !!!

lesquoyb commented 1 week ago

Normally I run it through a python script but what you can do is start gama-server (add the parameters '-socket' followed by a port number to the headless product) then open the connector.html file in your Web browser, toggle the server button, change port number to the one you previously gave to headless, connect, load the model (fill the path of the file and experiment namethen press load/reload) and press play

chapuisk commented 1 week ago

Ok ! I spotted the bug, however I don't know how we should handle it. To state it simply: server ask experiment to step > step of the experiment is to launch batch plan (100 simulations) > returns to server testing if end condition (the one specified in batch facet until) is met > it is not (to be clear it makes no sens in batch experiment) > step again experiment (has already done 100 simulations) > error The part of the code responsible for this is here. A workaround could be to test if (exp.isBatch()) and do not step in that case... however it would break the repeat way to go (experiment batch step() means going to the next set of repeat). I don't know why but headless server workspace seems to have the pref_parallel_simulation_all to true, putting it to false is another workaround (but the issue will appear if pref is changed)