bodkan / slendr

Population genetic simulations in R šŸŒ
https://bodkan.net/slendr
Other
54 stars 5 forks source link

Custom dispersal functions #73

Closed bodkan closed 2 years ago

bodkan commented 2 years ago

For now, the following dispersal kernels are implemented: uniform, normal, Cauchy and exponential. The center of the dispersal is always at the position of a parent (represented by 0). The only parameter characterising the dispersal (let's call it š›¾) is interpreted based on the kernel function. For uniform dispersal, a distance between [0, š›¾] is selected, for normal dispersal, a distance is drawn from Normal(0, š›¾), etc.

The position of an offspring from its parent is then determined by:

  1. Drawing a distance c from the appropriate kernel dispersal function -- this determines how far will the offspring wander off from its parent.
  2. Drawing a direction ɑ from a uniform distribution [0, 2Ļ€) -- this determines which direction will the offspring take,

and then converting these polar coordinates [c, ɑ] to cartesian [x, y] coordinates.

Note for future generations (and for myself if things break at some point):

This involved making changes all over the codebase. To make sure everything works as expected, I wrote a couple of unit tests which run a simple slendr/SLiM simulation which starts with all individuals in a single pixel in the center of a circular world in generation 1. In generation 2, offspring are allowed to disperse according to a specified dispersal kernel. Then the simulation finishes and I calculate Euclidean distance of each offspring from the position of its parent in that single generation.

I also run the same simulation but in R, sampling 1000 points in random directions around the origin [0, 0] using the R functions runif, rnorm, etc, converting from polar coordinates to an Euclidean distance. This basically represents a simple probabilistic expectation for what the SLiM backend should be doing. I use those distances (calculated from the full SLiM run and from the simple Monte Carlo distance distribution in R) to run a Kolmogorov-Smirnov test checking that the distances generated by the SLiM backend correspond to the probabilistic expectation. For easier debugging, I also compare plots of both sets of distributions with a given fixed random seed, simply by checking the MD5 hash of both images. A hack, but it does the job.

bodkan commented 2 years ago

This closes #59.

bhaller commented 2 years ago

Interesting. Are the polar coordinates really necessary? Isn't it equivalent, with some scaling factor perhaps, to draw the x and y displacements separately from the same distribution? (For the distributions you're using, anyway...?) @petrelharp this is math-y, so pinging you. :->

bodkan commented 2 years ago

In my own digging around this I found (to my initial shocked surprise) that generating x and y coordinates of an offspring by drawing x and y "shifts" from a parent from (for instance) the same normal distribution N(0, SD), will not yield points whose distances are normally distributed from the origin. Instead, the highest density of distances of such points was in a "ring" around the origin, not accumulated at the origin.

Later I found out that those (Euclidean) distances follow something called Rayleigh distribution. A useful visual-focused explanation of what is going on is in the first answer here. It boils down to "The bivariate distribution of the [distances] has its maximum at 0. However, the distribution of the distance from the center does not, since the only point contributing density there is the one at the center."

Maybe there's a more straightforward solution to this? I would love to know, because my math-fu is really weak. I picked the polar coordinates because I could empirically (*shudder*) "prove" this leads to the correct distribution of absolute distances parent-offspring.

petrelharp commented 2 years ago

Maybe this is too late, but IMO the cleaner way to do this is to draw (x, y) as independent Normals and then (optionall) scale them both by an independent factor. Normals are the unique distribution that has this property that if (X, Y) are independent draws then the bivariate distribution of (X, Y) is rotationally invariant. And, any unimodal rotationally invariant ditsribution can be made in this way.

For instance, the Cauchy is a scale mixture of Normals: if X is N(0,1) and Z is Gamma(1/2, s^2/2), then X/sqrt(Z) is Cauchy(0, s). So, setting the migration distance to (X/sqrt(Z), Y/sqrt(Z)) would get a dispersal distribution that has a Cauchy distribution in any direction, and is rotationally invariant. However, this won't have the property that dispersal distance is Cauchy. For that, you'd need a different distribution (ie to choose Z differently), and I don't know what it is.

It might be worth talking over how I implemented migration (a long time ago) in this simulation package I wrote: http://petrelharp.github.io/landsim/migration-methods.html - it works with discrete populations, but I did work out an efficient way to make individuals follow paths around barriers and things like that. (For instance, with a Cauchy distribution you'll get individuals making large jumps... even if that's right over a mountain.)

Anyhow, those are my thoughts - happy to discuss more.

petrelharp commented 2 years ago

Thinking some more - I think SLiM does it the way you have, and that's a great path to take, so never mind. (But the "going around barriers" is something to think about for the future.)

bodkan commented 2 years ago

Hello Peter, thanks so much for your thoughts on this and for the suggestions!

It's definitely not too late. I really appreciate your taking time to explain this to me! I will need to simulate a bunch of 2D points and distributions in R and make some figures to help my brain really understand the math but I think I get what's going on here.

It might be worth talking over how I implemented migration (a long time ago) in this simulation package I wrote: http://petrelharp.github.io/landsim/migration-methods.html - it works with discrete populations, but I did work out an efficient way to make individuals follow paths around barriers and things like that. (For instance, with a Cauchy distribution you'll get individuals making large jumps... even if that's right over a mountain.)

Awesome stuff. I could definitely see this being very useful for making the current SLiM backend script "smarter" in handling this. I love this vignette and all the examples! I will definitely take a closer look for an inspiration.

Thinking some more - I think SLiM does it the way you have, and that's a great path to take, so never mind.

Could explain what you mean by this? You mean how SLiM implements this internally? Bens first message seems to suggest otherwise?

bhaller commented 2 years ago

Thinking some more - I think SLiM does it the way you have, and that's a great path to take, so never mind.

Could explain what you mean by this? You mean how SLiM implements this internally? Bens first message seems to suggest otherwise?

Yeah, I'm not sure what @petrelharp meant either. SLiM doesn't implement any kind of dispersal kernel intrinsically; it is always written in the user's script. The recipes usually add an independent rnorm() draw to each spatial coordinate, which I guess results in a Rayleigh distribution ā€“ which maybe is wrong, biologically? If you guys reach consensus that that is wrong, please log a bug on SLiM that the recipes ought to be changed. I didn't try to follow the discussion above very closely (vacation mode :->). Using independent normal draws for each dimension is something I got from an old Dieckmann & Doebeli model, though, and knowing them, I would be a bit surprised if it were wrong. :->