BenjaminPeter / rangeexpansion

11 stars 2 forks source link

I need to use 1-psi to get expected results #10

Open petrikemppainen opened 1 year ago

petrikemppainen commented 1 year ago

Hi,

I'm testing this method in simulated data (spatially explicit individual based non-WF in SLiM), modelled after the grey reef sharks and found out that I had hack the code a bit to be able to use exclude.ocean=F, exclude.land=T for the "single.origin". However, I got exactly the opposite results (expansion from the population that was the last to be colonised, that also had the least diversity, I tried attaching a figure to illustrate this) unless I used 1-psi. Is this expected behaviour when working with marine species?

Since it is simulated data, "0" is always ancestral, but I tried switching this, as well as adding an ougroup (all 0's), but it made no difference (I guess there is no way to define the ancestral allele, without adding an "artificial outgroup" like this, unlike the implementation in snpR).

Thanks! Rplot07.pdf

bwaldron18 commented 1 year ago

I have a very similar issue with a post-glacial range expansion data set. The populations that I would predict to be on the end of the range expansion in the north (and have the lowest heterozygosity) are estimated as near the origin. I had tried adding a negative sign to the psi object in the 'run.regions' function, which has the same effect as your 1-psi, and places the origin of expansion where I would expect. But, because my data aren't simulated, I couldn't be sure that this was the issue. Did you find any more information on whether 1-psi is the correct fix?

petrikemppainen commented 1 year ago

I have not seen anything to suggest otherwise. This seems to stem from line 320 in R/re_functions2.R where I and j are used for columns and rows, respectively, instead of the other way around as is convention (and also consistent with all other code in this file that I have looked at). The effect is that the psi matrix is transposed so -psi and t(psi) are indeed the correct work arounds for this.

There is also a problem on 375 where "b" is used instead of "n" and the function "f.contribution" has b=2 as default instead of "b=n". This means that the code fails if you try to do anything else than n=2 (projecting down to 2 alleles). However, there is no difference in using n=2 or n=2n (no projection) so there is no reason to use anything else than n=2 and indeed this is also recommended in the original papers.

Also, you should NOT trust the p-values from the correlation between psi~d, they can be grossly inflated (trying to find a good explanation/fix for this as well which is on the to do list) and you should instead always test for significance of the psi values first. Unfortunately, there is no code provided for this in this repository and I'm trying to reverse engineer how the "block jackknifing" should be implemented but have not managed so far. However, you could use the binomial test instead. This can be achieved with (in get.psi function):

sfs <- resampled.mat * psi.mat p_bin <- binom.test(c(abs(as.integer(sfs[2,3])), abs(as.integer(sfs[3,2]))), p = 0.5)$p.value

This is provided n=2 (which is default)

However this needs further modifications for the rest of the pipeline to work as well. I realise I should fork this repository and implement all these fixes as I have also a multithreaded version of the get.psi function, but I'm still figuring stuff out.

In my simulations, the binomial test does not seem to result in any excessive false positive rates (no P-value inflation) under the null hypothesis (e.g. population expansion under panmixia) so it is a good start and at least I know how to implement it. Others have used a "permutation approach" (https://www.nature.com/articles/s41437-018-0164-0#Sec14) but they have not provided details of how this was done either (argh), so your guess is as good as mine.

I fear that many previous publications (including the above) have had the same issue as we are talking about here.

Lastly, it seems that the "edge effect" for psi is more pronounced than originally suggested leading to biases towards the middle, but I need more data/analyses to quantify this and to trust this conclusion (working on it). But of course only if -psi is used as otherwise you instead have a strong bias towards the (opposite) edge.

I have tried to contact Peter directly about this (shortly after I posted this issue) but I have not received an answer yet.

bwaldron18 commented 1 year ago

Thanks so much for your reply. I have only used this package for inferring the origin of expansion, so I had not yet encountered the other issues that you found.

I am inclined to use the results with the transposed psi matrix, but definitely want to verify that this is this is indeed the issue first. As you note, this would affect a number of previous studies that inferred range expansion origins if so.