Closed bodkan closed 6 months ago
Ah, another thing to note is that there are a couple of Eidos functions provided by the built-in slendr SLiM script (a slendr "public API" of sorts), which are designed to make writing slendr customization code easier:
population()
-- can refer to a slendr population with its "R symbolic name" and get the corresponding Subpopulation
objecttick()
-- computes a tick value corresponding to a given slendr time (i.e. "what's the tick corresponding to '14 thousand years before the present' in a slendr model)model_time()
-- does the opposite of converting a SLiM-based tick value to a corresponding slendr model time (i.e. "how many years before the present does tick value 1447 correspond to?")save_state()
and reset_state()
-- saves/loads the population state and slendr-specific population tagswrite_log()
-- an easy logging function already used by slendr internallyAnd then a couple of constants which are internally used by slendr, and might be useful for user-defined SLiM customization code:
SIMULATION_START
and SIMULATION_END
-- start tick (after burnin, if there's any) and end tick of a simulation of the currently running slendr modelSEQUENCE_LENGTH
-- sequence length given to the slim()
R function (if any was provided)Also, a massive, massive shout out to @bhaller for this:
It's absolutely not an overstatement to say that this is all that made this possible. The 4.2 release came out the very same day (not kidding) that I've been trying (and failing) to figure out the easiest way for users to schedule their own "events" without having to do lots of (often very complex) script block rescheduling.
I got the notification of SLiM 4.2 being released, saw this, and had everything important in place in a matter of a couple of hours.
Looks amazing; I will look in more detail soon!
Wow, very impressive!
Attention: Patch coverage is 85.95318%
with 42 lines
in your changes are missing coverage. Please review.
Project coverage is 90.38%. Comparing base (
3e02fa2
) to head (160161a
).
Files | Patch % | Lines |
---|---|---|
R/slim.R | 72.30% | 36 Missing :warning: |
R/msprime.R | 96.20% | 3 Missing :warning: |
R/interface.R | 66.66% | 2 Missing :warning: |
R/tree-sequences.R | 83.33% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I added a bunch of new unit tests, fixed old tests broken by the new changes.
I also renamed substitute()
to substitute_values()
to avoid clashes with an important base R metaprogramming function.
I will be now using the new functionality extensively in the new ABC R package — this will be a huge stress test for this functionality and should reveal lurking issues. It would be amazing to have some selection-relevant examples for the paper. But this PR has already grown too big, so further problems will be resolved with individual GH issues.
Once everything looks in place I will release v1.0.
Phew. This took way more full-time work than I expected! I’m glad I brushed up on my SLiM though, it’s been too long.
(I also have to think about whether this offer good mechanism for more elaborate tweaking of spatial slendr simulations, beyond the rather limited slendr R interface. But that’s for waaay later.)
Oh and the new vignette which describes the new functionality is now also rendered on the website. It's super rough, with broken English and probably not the clearest set of examples... but I ran out of energy in this PR and wanted to move on. :) I will be fixing the vignette as I'm polishing everything else on the main branch.
Wow! I look forward to checking this out soon. (Thanksgiving over here in the U.S. :->)
Happy Thanksgiving, friend! 🦃
Feel free to check out the relevant vignette. It’s kind of in a sorry state though, and poorly written. I put it together as a development log and to share it with a couple of slendr people who know SLiM already and were interested in adding selection to their slendr demographic models for some selection scan testing. I plan to polish this as I see the new stuff in action.
The gist of it is: the SLiM script built into slendr now provides a couple of Eidos functions (slendr Eidos’s API of sorts?) that allow the user to refer to slendr-specific elements of a demographic model (as they are defined in R) in their SLiM-customization script that’s then “#include”’d to that built-in script, swapping out things as needed. Like, override the default single-segment, uniform recombination rate, neutral genomic architecture mode of slendr, define non-neutral mutations, add them at specific times to a given slendr population, etc… while running all that in a slendr model as it was defined in R. Obviously only applies only to WF stuff (the only mode of operation supported by slendr, most likely forever), so still only a small subset of SLiM features.
I hope this doesn’t turn out to be a complete disaster. 😳🤣
This PR implements the biggest update to slendr to date — the possibility of running slendr/SLiM simulations which are non-neutral. This has been the last big remaining piece of the puzzle that took me a while to figure out. In fact, I think after this is cleaned up and polished, I will be releasing v1.0. Whatever comes down the line will likely involve adjusting / fixing / adding small things here and there, but nothing super major.
The update is too big to summarise here and an extensive description of the new functionality will become a part of a new vignette (probably finished gradually after this PR is merged). However, for posterity, here is a gist of things:
Let’s say we have the following slendr model:
We can run it in a non-spatial setting using slendr's SLiM back-end script to get a tree sequence like this:
Or we can run it with the msprime back end in a coalescent mode:
Obviously, if there’s no map provided, it makes way more sense to run things with msprime, because it’s going to be faster.
Over time, lots of people told me they would like to use slendr for some more complex demographic models (or even just for convenience, because they like R), but they need to simulate non-neutrality — adaptive introgression, background selection, polygenic traits, etc.
Very early on, we have decided to limit the proportion of SLiM-specific functionality that’s supported by slendr directly in its R interface — obviously, the range of non-neutral models that SLiM can write is infinite, but the R interface of slendr can be only as complex while still remaining easy to write, read, and understand. The decision was to only support the part of SLiM that’s needed to support slendr’s (still simplified!) spatial features, and — for non-spatial models — only those features which allow writing models which could be run interchangeably through both SLiM and msprime (our
slim()
andmsprime()
functions). Adding support for anything non-neutral for slendr/SLiM simulations would necessarily a) still support only an arbitrary support of what SLiM can do anyway and b) make the R API of slendr much more complex for models which don't deal with selection.This update implements a simple extension mechanism which allows users to plug in customized SLiM code into the built-in slendr/SLiM back-end script, making it possible to run various non-neutral models which still rely on the demographic part of slendr (like the code above), unchanged. They way we do this is to just let users write arbitrary SLiM code to do whatever they need, but still rely on the convenience of the demographic part of slendr, if that's convenient for them.
I’ll note right here that the SLiM models run by slendr are (and always will be) Wright-Fisher. That’s an assumption built at such a core layer of slendr that it doesn’t make sense to change it now.
Yes, this means that the extension mechanism for supporting selection will likely only be practically useful for non-spatial models (because fancy spatial models basically almost always need a non-WF setting, which is SLiM's domain, not slendr's).
With that out of the way, here’s how things are designed to work at the current stage using a trivial toy example — simulating a trajectory of a beneficial allele in a population over time as a function of:
First, the default mode of running SLiM models with slendr relies on this chunk of (hardcoded) code:
Obviously, this has been the root of the issue for people interested in running selection models with slendr. Not only is a single neutral mutation type baked in, but even a single genomic element is hardcoded, no mutation rate, and a fixed recombination rate.
Now let’s say we have the following bit of R code encoding this SLiM "snippet" (it can also be a file). Yes, by itself it doesn't do anything -- everything will be come clear soon:
The
{{thingies}}
above are parameter placeholders — they are not required, but are useful for parametrizing the model without having to hardcode fixed values. We can instantiate those parameter placeholders with a new slendr functionsubstitute()
like so:substitute()
simply plugs in values of given parameters into the template SLiM snippets (taking care of error checking, missing a parameter, extra parameters, etc.). What it produces is a SLiM snippet where all the{{thingies}}
are substituted for concrete values. The point of this is to not have to complicate things by specifying those parameters as CLI arguments when running the simulation on the command-line in the background, because theslim()
R function would need changes which would complicate its interface.Finally, the instantiated "extension snippet" can then be used in the good old
compile_model()
like this, without changing anything else:So, same thing as the base model above, just with one extra parameter
extension =
. Here the new version ofcompile_model()
swaps out theinitialize() { … }
block of slendr which enforces neutrality by default by theinitialize() {…}
block encoded in the user-defined SLiM extension script, and adds all the other code into the SLiM back-end script compiled for the user model.The final, customized model can be run in exactly the same manner as the original (neutral) version above, except we don’t have to specify
sequence_length =
andrecombination_rate =
because they are already provided by the extension SLiM snippet (theslim()
functions checks for this and informs the user accordingly if some required information is missing):In this case, this simulation run produces a file
~/Desktop/traj_EUR_YAM.tsv
. In fact, because in this toy example we only care about the frequency trajectory, we could also run the model withslim(model, ts = FALSE)
, and skip tree-sequence recording altogether.The
substitute()
command above makes it easy to wrap it all in a function parametrized byorigin_pop
and check how does the trajectory of the beneficial allele in EURopeans change depending on how it's expected to "traverse" the admixture graph encoded by the demographic model, giving results like this:This is only a toy example! The real power of this of course in setting up more complex genomic architectures with exons, introns, non-uniform recombination, multiple mutations interacting in complex ways, particularly in combination with tree-sequence analyses.
Final words: The above should help advanced SLiM users to take any complex slendr demographic model, and overlay an arbitrary genomic architecture or selection landscape on top of it, without having to concern themselves with setting up the demographic history manually in SLiM. For these kinds of scenarios, this sort of thing hopefully leverages the strengths of both slendr and SLiM.
We have currently two concrete projects for this: one simulating a strangely behaving adaptively(?) introgressed locus between Africans-Eurasians-Neanderthals/Denisovans, another one looking at quantitative traits in complex admixture demographic models. These two will be a good test of how well this functionality works and if there’s something I’ve missed. In particular, I haven’t looked yet at whether the way slendr deals with custom fitness effects etc. might somehow clash with user-provided customization code. We’ll see.
More complex examples will be in the vignette. [EDIT: It doesn't look like the html is fully rendered when downloaded from GitHub without the figures. It will look OK after it lands on the website.]