ctlab / GADMA

Genetic Algorithm for Demographic Model Analysis
Other
46 stars 14 forks source link

Model with no population split? #94

Open kristinaleilani opened 2 months ago

kristinaleilani commented 2 months ago

Hello, Does GADMA consider models with no population split?

I am comparing two fish populations from distant locations that show little genetic differentiation, and I'm not actually sure if they are divergent populations. When I run GADMA, the best model shows a recent split with migration. I've tried running with only Initial structure [1,1], and I tried with both Initial structure [1,1] and Final structure[2,1], and both outputs show a recent split with migration. I'm wondering if a model with no split was considered? Or is my best model better than one with no split?

Thanks in advance!

noscode commented 2 months ago

Hi @kristinaleilani,

That is a very good question. GADMA does not consider history without population split for two populations by default. However, you easily can do it. You have to remake your data as data for one population and run demographic inference for it. If you have VCF data, you need to remake popmap file with population labels so that all samples are from one population. If you have SFS file generated for dadi, you can call marginalize([0]) function (dadi manual here). Once data is ready, you can run GADMA inference for one population. As you use [1, 1] and [2, 1] structures for two populations, you can use [1], [2] and [3] structures for one population.

It is unclear for me though the best way to determine which model is the best, as basically you will have two models for two different datasets (1D and 2D). You say that 2D model has a very recent split event, I agree that it indicates that two populations probably are one population. Unfortunately, I do not know any statistical way to choose between 1D and 2D models. You cannot use neither likelihood nor AIC values to compare models.

Best regards, Ekaterina

z0on commented 2 months ago

Hi Ekaterina - this is Misha, Kristina’ collaborator - many thanks for speedy response (as usual)!

We used to solve this using models that run on the same two-pop dataset but don’t allow any time integration after split. This basically models the situation that the “two populations” are simply two samples from the same population. I’m pretty sure such a model would be AIC-comparable to the one with real split. Can GADMA do something like that?

Cheers Misha Matz

On Mon, Jul 15, 2024 at 8:43 AM Ekaterina Noskova @.***> wrote:

Hi @kristinaleilani https://github.com/kristinaleilani,

That is a very good question. GADMA does not consider history without population split for two populations by default. However, you easily can do it. You have to remake your data as data for one population and run demographic inference for it. If you have VCF data, you need to remake popmap file with population labels so that all samples are from one population. If you have SFS file generated for dadi, you can call marginalize([0]) function (dadi manual here https://dadi.readthedocs.io/en/latest/user-guide/manipulating-spectra/). Once data is ready, you can run GADMA inference for one population. As you use [1, 1] and [2, 1] structures for two populations, you can use [1], [2] and [3] structures for one population.

It is unclear for me though the best way to determine which model is the best, as basically you will have two models for two different datasets (1D and 2D). You say that 2D model has a very recent split event, I agree that it indicates that two populations probably are one population. Unfortunately, I do not know any statistical way to choose between 1D and 2D models. You cannot use neither likelihood nor AIC values to compare models.

Best regards, Ekaterina

— Reply to this email directly, view it on GitHub https://github.com/ctlab/GADMA/issues/94#issuecomment-2228539119, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZUHGCCCFVFBYKSMXHGWODZMPGWHAVCNFSM6AAAAABKXREFQ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRYGUZTSMJRHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- cheers Misha matzlab.weebly.com

noscode commented 2 months ago

Hi Misha,

Nice to hear from you. Hmm, I see, you want to have a zero time of the epoch after split, right? This make sense to me. I think right now the only way to achieve this is to use the custom model. I guess, you can still perform an automatic inference for one population using structures, but add an additional split in the model code for AIC comparison.

I can think of allowing GADMA to infer [X, 0] structure (which does the same), but I am not sure how difficult it will be and how much time will it take to implement.

Best regards, Ekaterina