Schraiber / selection

14 stars 7 forks source link

pop file issue #7

Open XuexueLiu opened 3 years ago

XuexueLiu commented 3 years ago

Dear Schraiber I am interest in your paper, and tried the asip data, while I cannot get your result, I post all my files and command here, please tell me is there something wrong about it? sample file 0 10 -0.078 -0.078 1 22 -0.051 -0.051 15 20 -0.014 -0.014 12 20 -0.011 -0.011 15 36 -0.004 -0.004 18 38 -0.002 -0.002 popfile
16000 0 -0.0390625 110000 -20.956 -0.2109375 110000 0 -0.307109375 195019 0 -0.414019531 107519 0 -0.540960938 82404 0 -0.691695313 124078 0 -0.870679688 171778 0 -1.083210938 181878 0 -1.335566406 112777 0 -1.63521875 45266 0 -1.991035156 20816 0 -2.41353125 16110 0 -2.915207031 23901 0 -3.510910156 53320 0 -4.21825 105982 0 -5.05815625 138531 0 -6.055472656 127587 0 -7.239695313 98370 0 -8.645859375 74423 0 -10.31555469 63543 0 -12.29817578 66523 0 -14.65235547 83610 0 -17.44775391 112343 0 -20.76702734 142428 0 -24.70838672 161440 0 -Inf ./sr -D asip.input -G 8 -N 16000 -n 500000 -d 0.001 -F 20 -f 1000 -s 100 -P horse.all.pop -e 8067 -a -o asip Thankyou so much! Best xue

Schraiber commented 3 years ago

Hi Xue,

Thanks for reaching out!

It looks like the issue is that your population size file and your input are in diffusion time units, but you're also specifying a population size using the -N command and a generation time using -g. I think that it'll run if you leave of the "-G 8 -N 16000" from the command line. Alternatively, you could multiply all the times in your files by 216000 and divide the growth rates in the popfile by 216000.

Does that seem to make sense?

By the way, I strongly recommend using this fork, which contains numerous bug fixes: https://github.com/ekirving/selection

Best, Josh

On Tue, Oct 26, 2021 at 2:25 AM XuexueLiu @.***> wrote:

Dear Schraiber I am interest in your paper, and tried the asip data, while I cannot get your result, I post all my files and command here, please tell me is there something wrong about it? sample file 0 10 -0.078 -0.078 1 22 -0.051 -0.051 15 20 -0.014 -0.014 12 20 -0.011 -0.011 15 36 -0.004 -0.004 18 38 -0.002 -0.002 popfile 16000 0 -0.0390625 110000 -20.956 -0.2109375 110000 0 -0.307109375 195019 0 -0.414019531 107519 0 -0.540960938 82404 0 -0.691695313 124078 0 -0.870679688 171778 0 -1.083210938 181878 0 -1.335566406 112777 0 -1.63521875 45266 0 -1.991035156 20816 0 -2.41353125 16110 0 -2.915207031 23901 0 -3.510910156 53320 0 -4.21825 105982 0 -5.05815625 138531 0 -6.055472656 127587 0 -7.239695313 98370 0 -8.645859375 74423 0 -10.31555469 63543 0 -12.29817578 66523 0 -14.65235547 83610 0 -17.44775391 112343 0 -20.76702734 142428 0 -24.70838672 161440 0 -Inf ./sr -D asip.input -G 8 -N 16000 -n 500000 -d 0.001 -F 20 -f 1000 -s 100 -P horse.all.pop -e 8067 -a -o asip Thankyou so much! Best xue

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Schraiber/selection/issues/7, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP7FYOC2TBJOTZO3D3CAUDUIZXXXANCNFSM5GXJUJZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

XuexueLiu commented 3 years ago

Thankyou for your answer, so the year=unit time 160002*8, the growth rate=-20.956/16000/2/8 Am I right?

I want to start with simple constant model, with sample file: 0 10 -20000 -20000 0 22 -13100 -13100 1 20 -3700 -3700 6 20 -3800 -3800 13 36 -2100 -2100 24 38 -500 -500 popfile 16000 0.0 -Inf Command: ./sr -D mc1r.year.input -G 8 -N 16000 -n 500000 -d 0.001 -F 20 -f 1000 -s 100 -P constant-horse.pop -e 8067 -o mc1r.year-horse.constant -a ./sr -D mc1r.year.input -G 8 -N 16000 -n 5000000 -d 0.001 -F 20 -f 1000 -s 100 -P constant-horse.pop -e 8067 -o mc1r.year-horse.constant-5e6 -a

https://drive.google.com/drive/folders/1ba-yK080GxppEAg3gspwsCENBXlBRCU7?usp=sharing

with different MCMC cycles I got different result, but neither of them showed the density of alpha1 and anpha2 near your 87.6 and 394.8. Am I missed something? Thanyou so much! Best Xue

Schraiber commented 3 years ago

Hi Xue,

I think you got it right, although I assume you have a typo, year=unit time

So, if I'm reading it correctly, your two command lines are actually just the same MCMC run, but one is longer, because you're fixing the same random seed for each (the -e command specifies a random seed). But if you take a look at the parameter traces for the longer one, you can see that something goes terribly wrong actually, and the MCMC is clearly not mixing correctly. To see this, plot the trace of lnL. Are you using the fork I sent you or the original one from my repo?

Best, Josh

On Thu, Oct 28, 2021 at 3:13 AM XuexueLiu @.***> wrote:

Thankyou for your answer, so the year=unit time 160002*8, the growth rate=-20.956/16000/2/8 Am I right?

I want to start with simple constant model, with sample file: 0 10 -20000 -20000 0 22 -13100 -13100 1 20 -3700 -3700 6 20 -3800 -3800 13 36 -2100 -2100 24 38 -500 -500 popfile 16000 0.0 -Inf Command: ./sr -D mc1r.year.input -G 8 -N 16000 -n 500000 -d 0.001 -F 20 -f 1000 -s 100 -P constant-horse.pop -e 8067 -o mc1r.year-horse.constant -a ./sr -D mc1r.year.input -G 8 -N 16000 -n 5000000 -d 0.001 -F 20 -f 1000 -s 100 -P constant-horse.pop -e 8067 -o mc1r.year-horse.constant-5e6 -a

https://drive.google.com/drive/folders/1ba-yK080GxppEAg3gspwsCENBXlBRCU7?usp=sharing

with different MCMC cycles I got different result, but neither of them showed the density of alpha1 and anpha2 near your 87.6 and 394.8. Am I missed something? Thanyou so much! Best Xue

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Schraiber/selection/issues/7#issuecomment-953706342, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABP7FYJHNK4RJL72XFUXG33UJEO3TANCNFSM5GXJUJZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

XuexueLiu commented 3 years ago

Thankyou a lot! I used the latest version, but the density plot showed both alpha1 and alpha2 are peak near ZERO (below), input files and command are the same. https://drive.google.com/drive/folders/1ba-yK080GxppEAg3gspwsCENBXlBRCU7

Thanks again! Best Xue

Schraiber commented 3 years ago

So, it's very clear that the MCMC was going wrong here, you can see this pretty clearly from the parameter trace from the long running one

[image: Screen Shot 2021-10-28 at 1.50.09 PM.png] it looks like it mixes okay until about sample 15000, when it just goes completely off the rails. This could be any number of things, and my first thought was that it was some numerical instability or a bug or something, which would be unfortunate because I don't really have the cycles to fix up this code (I now work outside of academia)

There have been a number of things changed in the algorithm, and it appears that this specific example may not have been robust to those changes. I think the reason is that the evidence of selection is pretty strong, and the numerics are not being stable.

However, after a bit of work, I think the issue was simply a typo in your input file :) You have

1 20 -3700 -3700 6 20 -3800 -3800

which should be

1 20 -3700 -3700 6 20 -2800 -2800

not only was this creating an issue because the time steps were out of order (I should probably have a check for this when the input files are being read in) it was having two time points INCREDIBLY close together with a major allele frequency change, which was basically posing serious numerical issues to the algorithm.

Luckily, upon fixing the typo, things basically run fine!

Nonetheless, I am still seeing alpha1 and alpha2 peak near 0. I'm not 100% sure why this is happening, as the data really does look on visual inspection like it supports a model with positive selection. I think it's possible that there were some changes to the priors over the many iterations of this algorithm. It's also possible a bug was introduced. There might be some easy things to try. But in the mean time, I think that you should try it out on your data of interest and see what happens.

Let me know if that was helpful!

Best, Josh

Schraiber commented 3 years ago

Hey Xue,

For what it's worth I think i know what the issue is---there's an additional parameter that the model now has (compared to when the paper came out) that tries to account for latent population structure, and it seems like in this particular case that parameter seems to absolutely kill the signal for this data---I suspect it's because the change is so rapid that the model thinks it's "more likely" under a model of population structure than a model with actual selection. For now, the only way to control this parameter is through editing the code directly, so I'll see if I have some time in the next week or so to implement a way for the user to control this.

Best, Josh

On Thu, Oct 28, 2021 at 3:20 PM Joshua Schraiber @.***> wrote:

So, it's very clear that the MCMC was going wrong here, you can see this pretty clearly from the parameter trace from the long running one

[image: Screen Shot 2021-10-28 at 1.50.09 PM.png] it looks like it mixes okay until about sample 15000, when it just goes completely off the rails. This could be any number of things, and my first thought was that it was some numerical instability or a bug or something, which would be unfortunate because I don't really have the cycles to fix up this code (I now work outside of academia)

There have been a number of things changed in the algorithm, and it appears that this specific example may not have been robust to those changes. I think the reason is that the evidence of selection is pretty strong, and the numerics are not being stable.

However, after a bit of work, I think the issue was simply a typo in your input file :) You have

1 20 -3700 -3700 6 20 -3800 -3800

which should be

1 20 -3700 -3700 6 20 -2800 -2800

not only was this creating an issue because the time steps were out of order (I should probably have a check for this when the input files are being read in) it was having two time points INCREDIBLY close together with a major allele frequency change, which was basically posing serious numerical issues to the algorithm.

Luckily, upon fixing the typo, things basically run fine!

Nonetheless, I am still seeing alpha1 and alpha2 peak near 0. I'm not 100% sure why this is happening, as the data really does look on visual inspection like it supports a model with positive selection. I think it's possible that there were some changes to the priors over the many iterations of this algorithm. It's also possible a bug was introduced. There might be some easy things to try. But in the mean time, I think that you should try it out on your data of interest and see what happens.

Let me know if that was helpful!

Best, Josh

XuexueLiu commented 3 years ago

Appreciate to you, I even try the correct one, the result is always the same. sample file: 0 10 -20000 -20000 0 22 -13100 -13100 1 20 -3700 -3700 6 20 -2800 -2800 13 36 -2100 -2100 24 38 -500 -500 Hope I can hear from you soon! Best Xue

xuefenfei712 commented 2 years ago

Dear Schraiber Can I hear from you about fixing the problem of selection? Best Xue