ben18785 / Selection_simulations

Simulating Wright-Fisher, Moran and Yule processes.
1 stars 1 forks source link

a puzzle about selection and frequency trajectories #12

Open Armand1 opened 5 years ago

Armand1 commented 5 years ago

I think it would be nice if we could predict the frequency trajectories of individual species from their selection coefficients. I can imagine how to do that using simulation. So, I looked at the trajectories of the five most fit and five least fit species. In general, they behave as they should: fit species increase in frequency, unfit ones decrease. But not all do. For example, Erythrina constaricensis is the fifth-most fit species, yet it declines quite strongly in frequency. Any thoughts as to why this might be?

These trajectories are from the data that I gave you. But you then bashed them into shape for your model. It would be good to get the bashed data to make sure nothing funny happened there.

Five fit species fivefit

Five unfit species fiveunfit

Armand1 commented 5 years ago

Now that I have the names of the species I can check whether they reflect linear regression coefficients ("estimate" on the x axis below). It doesn't look good.

Rplot02

For my full analysis see this:

checking_estimates_of_selection_on_species.pdf

I can't help but think that the names have somehow gotten scrambled in the selection coefficient output

ben18785 commented 5 years ago

Ok, yes, that doesn't look right. Let me check!

On Thu, Jul 18, 2019 at 3:32 PM Armand1 notifications@github.com wrote:

Now that I have the names of the species I can check whether they reflect linear regression coefficients. I doesn't look good.

selection estimates vs linear regression coefficients.pdf https://github.com/ben18785/Selection_simulations/files/3406975/selection.estimates.vs.linear.regression.coefficients.pdf

For my full analysis see this:

checking_estimates_of_selection_on_species.pdf https://github.com/ben18785/Selection_simulations/files/3406976/checking_estimates_of_selection_on_species.pdf

I can't help but think that the names have somehow gotten scrambled in the selection coefficient output

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ben18785/Selection_simulations/issues/12?email_source=notifications&email_token=ABCILKG7B65CQL2OZYMJFZTQAB5HLA5CNFSM4H64CRMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2IVOEQ#issuecomment-512841490, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCILKBXK3CAJGLIGYWKN5DQAB5HLANCNFSM4H64CRMA .

Armand1 commented 5 years ago

An additional puzzle.

I estimated the regression coefficients on your bashed data. They disagree with your selection coefficients. That's a problem.

However, I also decided to check the bashed data against the original data that I gave you. Here are two plots. Each point is a count for a species in a given year. They should be identical -- have a slope = 1 -- except for those species which have been given subscripts (XXX_1) which would have been excluded by the merge. But they're not.

Rplot06

here's one particularly common species.

Rplot07 I just don't get it. For this species (Hesteria concinna) your bashed data has 30-40,000 individuals per year. The data I sent you has a few hundred.

Here's confirmation that the data have been scrambled. Your "Heisteria concinna" has the data for my "Hyabanthus prunifolius"

Data_bashed_by_Ben Heisteria concinna 38697 Heisteria concinna 41510 Heisteria concinna 41016 Heisteria concinna 36886 Heisteria concinna 32723 Heisteria concinna 28343 Heisteria concinna 26869

Data given to Ben Hybanthus prunifolius_1982 38697 Hybanthus prunifolius_1985 41088 Hybanthus prunifolius_1990 40439 Hybanthus prunifolius_1995 36058 Hybanthus prunifolius_2000 31925 Hybanthus prunifolius_2010 27845 Hybanthus prunifolius_2015 26869

Note. The name scrambling has happened twice First, from my "BCI data forben reproductives only" --> your (bashed) "reprodutives names_stan_rds.)

Second, from your bashed data --> the selection coefficients. The latter must be true since I estimated the linear regressions from your bashed data and they don't agree either.

So, implausible though it may seem, there have been two name errors. Unless I've fucked all this up!

ben18785 commented 5 years ago

Yes, this is all weird, sorry. I am looking through it now and have found the same thing.

On Thu, Jul 18, 2019 at 5:05 PM Armand1 notifications@github.com wrote:

An additional puzzle.

I estimated the regression coefficients on your bashed data. They disagree with your selection coefficients. That's a problem.

However, I also decided to check the bashed data against the original data that I gave you. Here are two plots. Each point is a count for a species in a given year. They should be identical except for those species which have been given subscripts (XXX_1) which would have been excluded by the merge. But they're not.

[image: Rplot07] https://user-images.githubusercontent.com/8698079/61471070-05cdfa00-a97a-11e9-8d8e-4e57ebe06102.jpeg

here's one particularly common species.

[image: Rplot06] https://user-images.githubusercontent.com/8698079/61471071-05cdfa00-a97a-11e9-9d31-c4990972087d.jpeg

I just don't get it. For this species (Hesteria concinna) your bashed data has 30-40,000 individuals per year. The data I sent you has a few hundred.

Here's confirmation that the data have been scrambled. Your "Heisteria concinna" has the data for my "Hyabanthus prunifolius"

             spec_year N_Data_given_to_Ben               species N_Data_bashed_by_Ben

690 Heisteria concinna_1982 151 Heisteria concinna 38697 691 Heisteria concinna_1985 162 Heisteria concinna 41510 692 Heisteria concinna_1990 165 Heisteria concinna 41016 693 Heisteria concinna_1995 176 Heisteria concinna 36886 694 Heisteria concinna_2000 184 Heisteria concinna 32723 695 Heisteria concinna_2010 196 Heisteria concinna 28343 696 Heisteria concinna_2015 207 Heisteria concinna 26869 729 Hybanthus prunifolius_1982 38697 Hybanthus prunifolius 7 730 Hybanthus prunifolius_1985 41088 Hybanthus prunifolius 3 731 Hybanthus prunifolius_1990 40439 Hybanthus prunifolius 0 732 Hybanthus prunifolius_1995 36058 Hybanthus prunifolius 0 733 Hybanthus prunifolius_2000 31925 Hybanthus prunifolius 0 734 Hybanthus prunifolius_2010 27845 Hybanthus prunifolius 0 735 Hybanthus prunifolius_2015 26869 Hybanthus prunifolius 0

Note. The name scrambling has happened twice First, from my "BCI data forben reproductives only" --> your (bashed) "reprodutives names_stan_rds.)

Second, from your bashed data to the selection coefficients. The latter must be true since I estimated the linear regressions from your bashed data and they don't agree either.

So, implausible though it may seem, there have been two name errors. Unless I've fucked all this up!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ben18785/Selection_simulations/issues/12?email_source=notifications&email_token=ABCILKDQXT6TUF2TYJ5UETDQACIDPA5CNFSM4H64CRMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2I7DIY#issuecomment-512881059, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCILKFOTQQD5RD4VNZAA4TQACIDPANCNFSM4H64CRMA .

ben18785 commented 5 years ago

Right; sorry. It was an issue with the newly introduced species messing up the order. I have checked the counts on the new data frame versus the old and all (I think) seems ok. Can you rerun your analyses with the new data? Note there are new selection estimates too as the names were messed up in that file as well.

Ben

On Thu, Jul 18, 2019 at 5:14 PM Ben Lambert ben.c.lambert@gmail.com wrote:

Yes, this is all weird, sorry. I am looking through it now and have found the same thing.

On Thu, Jul 18, 2019 at 5:05 PM Armand1 notifications@github.com wrote:

An additional puzzle.

I estimated the regression coefficients on your bashed data. They disagree with your selection coefficients. That's a problem.

However, I also decided to check the bashed data against the original data that I gave you. Here are two plots. Each point is a count for a species in a given year. They should be identical except for those species which have been given subscripts (XXX_1) which would have been excluded by the merge. But they're not.

[image: Rplot07] https://user-images.githubusercontent.com/8698079/61471070-05cdfa00-a97a-11e9-8d8e-4e57ebe06102.jpeg

here's one particularly common species.

[image: Rplot06] https://user-images.githubusercontent.com/8698079/61471071-05cdfa00-a97a-11e9-9d31-c4990972087d.jpeg

I just don't get it. For this species (Hesteria concinna) your bashed data has 30-40,000 individuals per year. The data I sent you has a few hundred.

Here's confirmation that the data have been scrambled. Your "Heisteria concinna" has the data for my "Hyabanthus prunifolius"

             spec_year N_Data_given_to_Ben               species N_Data_bashed_by_Ben

690 Heisteria concinna_1982 151 Heisteria concinna 38697 691 Heisteria concinna_1985 162 Heisteria concinna 41510 692 Heisteria concinna_1990 165 Heisteria concinna 41016 693 Heisteria concinna_1995 176 Heisteria concinna 36886 694 Heisteria concinna_2000 184 Heisteria concinna 32723 695 Heisteria concinna_2010 196 Heisteria concinna 28343 696 Heisteria concinna_2015 207 Heisteria concinna 26869 729 Hybanthus prunifolius_1982 38697 Hybanthus prunifolius 7 730 Hybanthus prunifolius_1985 41088 Hybanthus prunifolius 3 731 Hybanthus prunifolius_1990 40439 Hybanthus prunifolius 0 732 Hybanthus prunifolius_1995 36058 Hybanthus prunifolius 0 733 Hybanthus prunifolius_2000 31925 Hybanthus prunifolius 0 734 Hybanthus prunifolius_2010 27845 Hybanthus prunifolius 0 735 Hybanthus prunifolius_2015 26869 Hybanthus prunifolius 0

Note. The name scrambling has happened twice First, from my "BCI data forben reproductives only" --> your (bashed) "reprodutives names_stan_rds.)

Second, from your bashed data to the selection coefficients. The latter must be true since I estimated the linear regressions from your bashed data and they don't agree either.

So, implausible though it may seem, there have been two name errors. Unless I've fucked all this up!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ben18785/Selection_simulations/issues/12?email_source=notifications&email_token=ABCILKDQXT6TUF2TYJ5UETDQACIDPA5CNFSM4H64CRMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2I7DIY#issuecomment-512881059, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCILKFOTQQD5RD4VNZAA4TQACIDPANCNFSM4H64CRMA .

Armand1 commented 5 years ago

ok --point me to the new data?

ben18785 commented 5 years ago

Oh, thought it was attached. See your email.

On Thu, Jul 18, 2019 at 6:38 PM Armand1 notifications@github.com wrote:

ok --point me to the new data?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ben18785/Selection_simulations/issues/12?email_source=notifications&email_token=ABCILKAXEOKIULM24BOM6GTQACTA5A5CNFSM4H64CRMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2JHQQA#issuecomment-512915520, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCILKGZWZ6MJXGFUP4R7WTQACTA5ANCNFSM4H64CRMA .

Armand1 commented 5 years ago

The bashed data are fixed relative to the data I gave you. So that's good!

The bad news, I think, is that the selection names are still not fixed relative to the bashed data. This is a version of the same plot as above, just with the axes flipped (it seems more intuitive).

We'd expect that s_freqinde-50 increases the estimate should become more positive. But it doesn't. I have been racking my brains wondering if we are misunderstanding this in some way, but I don't see it. Can you look at the selection names again? Rplot04

While you're about it could you make the selection coefficients relative to max(s_freqinde-50)? rather than max(s_freqinde) or max(s_freqinde-97.5) which it is now? I want the main estimate of the best species to be zero.

ben18785 commented 5 years ago

I'm not sure what you're plotting. What is "estimate"? There doesn't seem to be much variation in this quantity...

On Thu, Jul 18, 2019 at 7:36 PM Armand1 notifications@github.com wrote:

The bashed data are fixed relative to the data I gave you. So that's good!

The bad news, I think, is that the selection names are still not fixed relative to the bashed data. This is a version of the same plot as above, just with the axes flipped (it seems more intuitive).

We'd expect that s_freqinde-50 increases the estimate should become more positive. But it doesn't. I have been racking my brains wondering if we are misunderstanding this in some way, but I don't see it. Can you look at the selection names again? Rplot03.pdf https://github.com/ben18785/Selection_simulations/files/3407937/Rplot03.pdf

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ben18785/Selection_simulations/issues/12?email_source=notifications&email_token=ABCILKDR7U55EA3QYBGPORTQACZZXA5CNFSM4H64CRMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2JMRTQ#issuecomment-512936142, or mute the thread https://github.com/notifications/unsubscribe-auth/ABCILKE6YUNPQ3YCE7APVEDQACZZXANCNFSM4H64CRMA .

Armand1 commented 5 years ago

See pdf above - it’s the slope of a regression of frequency v census year. One for every species.

ben18785 commented 5 years ago

This is a graph of my estimated beta coefficients vs the % change in frequency (from 1982-2010) for each variant. To me, this looks like the estimation is working.

freq_inde_beta_vs_change.pdf

Similarly, for estimated selection coefficients from the independent model.

freq_inde_s_vs_change.pdf

@Armand1 What do you think?

Armand1 commented 5 years ago

Well, scrambled data could hardly produce that result. Then there’s something funny about the relationship between selection coefficients and regression coefficients. I’ll probe a bit more.