Improve robustness of the "simulation finished" check in `slim()`

bodkan commented 1 year ago

@IsabelMarleen has encountered an interesting curiosity while implementing a slendr importer from Demes. When she imports the Gutenkunst et al. example from the Demes repository, successfully generating a slendr model object, she runs into an interesting issue when she simulates it with our built-in slim() function.

Apparently it turned out to be a challenge to produce a minimum reproducible example, but she provided a serialized, compiled slendr model that reproduces this: gutenkunst_slendr_model.zip. Briefly, loading her serialized model first:

library(slendr) # v0.5.1
init_env()

model <- read_model("<path to a directory with the unzipped model data>")
plot_model(model)

It seems that Isabel's demes-to-slendr importing code works as it should (let's ignore the poor plotting of slendr for now):

However, running the model with slim() gives us:

> ts <- slim(model, sequence_length = 1000, recombination_rate = 0, verbose = TRUE)
Tree-sequence recording is on but no sampling schedule was given. Generating one for all individuals surviving to the end of the simulation.
--------------------------------------------------                            
SLiM command to be executed:

slim   \
    -d 'SAMPLES="/var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T//RtmpY74O5c/file79e945962461"' \
    -d 'MODEL="/Users/mp/Downloads/gutenkunst_slendr_model"' \
    -d 'OUTPUT_TS="/var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T//RtmpY74O5c/file79e962919bf5.trees"' \
    -d SPATIAL=F \
    -d SEQUENCE_LENGTH=1000 \
    -d RECOMB_RATE=0 \
    -d BURNIN_LENGTH=0 \
    -d SIMULATION_LENGTH=8800 \
    -d 'OUTPUT_LOCATIONS=""' \
    -d COALESCENT_ONLY=T \
    -d MAX_ATTEMPTS=1 \
    /Users/mp/Downloads/gutenkunst_slendr_model/script.slim 2>&1 
--------------------------------------------------

// Initial random seed:
1993690918446

// RunInitializeCallbacks():
SEED: 1993690918446
initializeSLiMOptions(keepPedigrees = T);
initializeTreeSeq();
initializeMutationType(0, 0.5, "f", 0);
initializeGenomicElementType(1, m0, 1);
initializeGenomicElement(g1, 0, 999);
initializeMutationRate(0);
initializeRecombinationRate(0);

// Starting run at tick <start>:
1 

Generation 1: starting the simulation
Generation 1: creating ancestral(p0)
Generation 1: split of AMH(p1) from ancestral(p0)
Generation 1: cleanup of ancestral(p0)
Generation 3201: split of OOA(p2) from AMH(p1)
[ ... snip ... ]
Generation 8801: sampling all (12300) individuals of YRI(p3)
[ note this ---> ] Generation 8801: saving the tree sequence output to '/var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T//RtmpY74O5c/file79e962919bf5.trees'
[ note this ---> ] Generation 8801: simulation finished
Generation 8801: geneflow YRI(p3) -> CEU(p4) set to 0%
Generation 8801: geneflow CEU(p4) -> YRI(p3) set to 0%
Generation 8801: geneflow YRI(p3) -> CHB(p5) set to 0%
Generation 8801: geneflow CHB(p5) -> YRI(p3) set to 0%
Generation 8801: geneflow CEU(p4) -> CHB(p5) set to 0%
Generation 8801: geneflow CHB(p5) -> CEU(p4) set to 0%
[ note this ---> ] Error: Unfortunately SLiM terminated before a tree sequence was saved.
[ note this ---> ] See the above for an indication of where things could have gone wrong.

The reason for the error is because I'm currently checking for the success of the simulation by making sure the last line of the slendr/SLiM log output contains "simulation finished". Not the most elaborate way to do this, but it's served me for a year now...

Now, the above is not really a problem per-se -- after all, the simulation does finish running successfully, and it does produce a valid tree sequence. I could simply check for "simulation finished" anywhere in the log output and not just at the very end.

Still, I'm surprised that the gene-flow callback function log outputs are printed after the call to sim.simulationFinished() is executed -- although they are all scheduled to happen at the end of the last generation (well, "tick", but you get my meaning), I assumed they would be executed in top-to-bottom order (gene-flow events are executed in callbacks s2 & s3, termination happens in callback s12 below.

bodkan commented 1 year ago

My first instinct was that I made a mistake in rescheduling script blocks in cases of models which have gene flow running all the way to the very end of the simulation. But this minimal model does not have the issue despite featuring gene flow active until "the present" (time 0, because it uses backward time unit specification to match Isabel's Gutenkunst model):

library(slendr)
init_env()

p1 <- population("p1", time = 1000, N = 100)
p2 <- population("p2", time = 500, N = 100, parent = p1)

gf <- gene_flow(from = p1, to = p2, start = 300, end = 0, rate = 0.1)

model <- compile_model(list(p1, p2), gene_flow = gf, generation_time = 1)
plot_model(model)

... followed by this:

> ts <- slim(model, sequence_length = 1000, recombination_rate = 0, verbose = TRUE)
Tree-sequence recording is on but no sampling schedule was given. Generating one for all individuals surviving to the end of the simulation.
--------------------------------------------------                            
SLiM command to be executed:

slim   \
    -d 'SAMPLES="/var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T//RtmpfmDznJ/file7f096f7ddcd7"' \
    -d 'MODEL="/var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T//RtmpfmDznJ/file7f0942c321c"' \
    -d 'OUTPUT_TS="/var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T//RtmpfmDznJ/file7f092bd4105b.trees"' \
    -d SPATIAL=F \
    -d SEQUENCE_LENGTH=1000 \
    -d RECOMB_RATE=0 \
    -d BURNIN_LENGTH=0 \
    -d SIMULATION_LENGTH=1000 \
    -d 'OUTPUT_LOCATIONS=""' \
    -d COALESCENT_ONLY=T \
    -d MAX_ATTEMPTS=1 \
    /var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T//RtmpfmDznJ/file7f0942c321c/script.slim 2>&1 
--------------------------------------------------

// Initial random seed:
2008322296169

// RunInitializeCallbacks():
SEED: 2008322296169
initializeSLiMOptions(keepPedigrees = T);
initializeTreeSeq();
initializeMutationType(0, 0.5, "f", 0);
initializeGenomicElementType(1, m0, 1);
initializeGenomicElement(g1, 0, 999);
initializeMutationRate(0);
initializeRecombinationRate(0);

// Starting run at tick <start>:
1 

Generation 1: starting the simulation
Generation 1: creating p1(p0)
Generation 501: split of p2(p1) from p1(p0)
Generation 701: geneflow p1(p0) -> p2(p1) (0.000333333% over 300 generations)
Generation 1001: geneflow p1(p0) -> p2(p1) set to 0%
Generation 1001: sampling all (100) individuals of p1(p0)
Generation 1001: sampling all (100) individuals of p2(p1)
Generation 1001: saving the tree sequence output to '/var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T//RtmpfmDznJ/file7f092bd4105b.trees'
Generation 1001: simulation finished
Tree sequence was saved to:
 /var/folders/d_/hblb15pd3b94rg0v35920wd80000gn/T//RtmpfmDznJ/file7f092bd4105b.trees 
Loading the tree-sequence file...

[Note that the Generation 1001: geneflow p1(p0) -> p2(p1) set to 0% now happens before saving the tree sequence and executing sim.simulationFinished().]

So there's something else I'm missing.

bodkan commented 1 year ago

For the time being (and for the purpose of her Master thesis), @IsabelMarleen implemented the fix described above, grepl()-ing for "simulation finished" in the log output fetched by the slim() function from the SLiM binary running in the background.

Still, it would be good to see just where is my assumption about the script block execution/scheduling off by nailing down the minimum possible reproducible example for this.

bhaller commented 1 year ago

Still, it would be good to see just where is my assumption about the script block execution/scheduling off by nailing down the minimum possible reproducible example for this.

IIRC, when you reschedule a script block the order that it runs in, within a tick, goes to the end (as if that script block were defined at the end of the file). See section 25.11 for details, I think. So the order in which you reschedule things will determine the order in which they run. Probably that is what is biting you.

bodkan commented 1 year ago

Thanks @bhaller! I didn't want to ping you before I found the smallest possible reproducible example, to simplify digging for the root of the problem in my SLiM code... I appreciate the hint in advance!

when you reschedule a script block the order that it runs in, within a tick, goes to the end (as if that script block were defined at the end of the file). See section 25.11 for details, I think. So the order in which you reschedule things will determine the order in which they run. Probably that is what is biting you.

I suspected something like this could be an issue. I will check the manual carefully later this week (I opened the issue so that @IsabelMarleen is not afraid to post more code in case she finds other, smaller, breaking examples :)).

As I mentioned above, because her models do simulate the correct tree sequence, I don't think this is a breaking bug. It seems to be more of an aesthetics thing (I "like" having the final log output be the "simulation finished"). Then again, when something does look weird or ugly, it can often mean that there is actually a real issue of some kind which could manifest in some other situation...

bhaller commented 1 year ago

Always best to understand a bug, even if you then decide not to expend the effort to fix it.

bodkan / slendr

Improve robustness of the "simulation finished" check in `slim()` #129