brannala / BA3

BA3 is a software package for estimating rates of migration between populations using multilocus genotype data.
GNU Affero General Public License v3.0
9 stars 1 forks source link

Unrealistically high migration rates #8

Open tomoosting opened 7 months ago

tomoosting commented 7 months ago

Hi,

I've been running BA3 to estimate migration rates between 2 fish populations but I'm getting some unexpected results. I used BA3-SNPS and the finetuning script to help set the mixing parameters.

For the actual analyses I used BA3-3.0.5 using the following syntax. BA3SNP -v -u -g -t -b500000 -i5000000 -n100 -m 0.05 -a 0.15625 -f 0.0046875 -s$SEED -o $OUT/$SET'_s'$SEED'_BA3.out' $DIR/$SET'.ba3' 1> $OUT/$SET'_s'$SEED'_std.out'

I expected unequal migration rates between the populations, but one of the migration rates is unrealistic. I suspect the program has issues with the low level of genetic divergence between the two populations. 0->East 1->West Migration Rates: m[0][0]: 0.9833(0.0063) m[0][1]: 0.0167(0.0063) m[1][0]: 0.3281(0.0217) m[1][1]: 0.6719(0.0217)

The program keeps converging m[1][0] to ~0.33 in all my runs. Migrant ancestries from most West[1] individuals were estimated as second-generation migrants. for example: Migrant ancestry>> [0,0]:0.000 [1,0]:0.024 [0,1]:0.000 [1,1]:0.000 [0,2]:0.976 [1,2]:0.000

Is there anything I can do to improve my runs? I've looked into setting prior distributions, but that isn't an option for BA3. I've made sure that all SNPs included in the analyses have a maf of 0.05 in both populations. This turned out to only slow down the convergence process. Any suggestions would be much appreciated :)

Attached is the trace file, output, stdout, and ancestries for each individual from my last run. BA3trace.txt snapper_norm_s182_BA3out.txt snapper_norm_s182_stdout.txt BA3indiv_no_genotypes.txt

brannala commented 7 months ago

Hi Tom, What are the sample sizes for each of the two populations? If you send me your input file I can look into these results. Bruce

tomoosting commented 7 months ago

Hi Bruce, I've sent you an email with a link containing the necessary files. Many thanks, Tom

tomoosting commented 5 months ago

Dear Bruce,

Have you had a chance to look at my results? I'd be very interested in knowing whether I can use BA3 or have to move to something else. Many thanks,

Tom

brannala commented 5 months ago

Hi Tom, Sorry to be slow on this. I am exploring some analyses using small subsets of a few hundred loci to see whether convergence is an issue. So far I am seeing similar patterns. Is it possible that the two population sizes are very different? That will lead to very different proportions of migrants even if the individual migration rates are similar in both directions.

tomoosting commented 5 months ago

Dear Bruce,

No worries, I'm already very grateful you're willing to have a look at my dataset.

Yes, the East population will have a significantly larger population size. It's been hard to get reliable estimates on Ne because it's a fisheries species that is very abundant. It's regarding Australasian snapper in New Zealand, which has the largest inshore fisheries of the country. To give you an idea of the difference in abundance, the management areas that predominantly cover the East population have a TACC of 4,815 tonnes, while the management areas that cover the West population have TACC of 1584 tonnes. Also, genetic differentiation is very low between the populations, FST = 0.0031. There are two regions on opposite sides of the North Island of NZ where mixing between both populations is most likely occurring. I have a few admixed individuals and was also hoping for more information about the ancestry of the individuals using the -g flag.

Based on Ocean currents I'm expecting migration to be higher from East to West. But perhaps the larger population size from the east population will make it very difficult. Do you think it would be possible to get reliable estimates of migration rates if the population sizes are very different? I'm also exploring mirgate-n which is already tricky with unphased SNP data.

Many thanks again for your help and insights. Best, Tom

Op ma 29 apr 2024 om 02:51 schreef Bruce Rannala @.***>:

Hi Tom, Sorry to be slow on this. I am exploring some analyses using small subsets of a few hundred loci to see whether convergence is an issue. So far I am seeing similar patterns. Is it possible that the two population sizes are very different? That will lead to very different proportions of migrants even if the individual migration rates are similar in both directions.

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2081730679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXUI7LNBHWHSEODZK6CMHDY7WKSLAVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRG4ZTANRXHE . You are receiving this because you authored the thread.Message ID: @.***>

tomoosting commented 4 months ago

Dear Bruce,

Sorry to keep bothering you. I was hoping to get your insight on whether you think it's possible to obtain reliable migration rates from my populations using BA3 or perhaps any other program like migrate-N. To get a little more information on the difference in population size I ran NeEstimator. I did this based on your previous email, and I wanted to get a little more insight. Ne estimates from NeEstimator [image: image.png] I tried cutting corners by using a subset of loci, but ended up running my entire neutral dataset. The estimates using maf 0.05 will likely be most trustworthy which actually don't differ that much. But knowing this approach doesn't work well with species with large population sizes I don't know how useful these estimates are..

Would it be possible that, given genetic differentiation is low (FST = 0.0031), a higher migration rate in one direction would result in the appearance of many hybrys on the sink population? which is the case when I run BA3: Output BA3 0->East 1->West Migration Rates: m[0][0]: 0.9833(0.0063) m[0][1]: 0.0167(0.0063) m[1][0]: 0.3281(0.0217) m[1][1]: 0.6719(0.0217)

I've also done a number of runs in mirgate-N. This program doesn't really lend itself to WGS data but I found a way to generate reliable input. Output mirgate-N [image: image.png] The migration rates and genetic diversity were very similar between the two populations. So it could be that the effective population sizes are actually not that different (based on NeEstimator and mirate-N). But if both effective population sizes and migration rates are similar I cannot explain why BA3 is giving me a different answer.

I guess what I'm asking is whether you have any insights that help explain my results. I'll stop bothering you after email but I was hoping to get some final advice :).

Thank you again for your help. Cheers, Tom

Op ma 29 apr 2024 om 17:39 schreef Tom Oosting @.***>:

Dear Bruce,

No worries, I'm already very grateful you're willing to have a look at my dataset.

Yes, the East population will have a significantly larger population size. It's been hard to get reliable estimates on Ne because it's a fisheries species that is very abundant. It's regarding Australasian snapper in New Zealand, which has the largest inshore fisheries of the country. To give you an idea of the difference in abundance, the management areas that predominantly cover the East population have a TACC of 4,815 tonnes, while the management areas that cover the West population have TACC of 1584 tonnes. Also, genetic differentiation is very low between the populations, FST = 0.0031. There are two regions on opposite sides of the North Island of NZ where mixing between both populations is most likely occurring. I have a few admixed individuals and was also hoping for more information about the ancestry of the individuals using the -g flag.

Based on Ocean currents I'm expecting migration to be higher from East to West. But perhaps the larger population size from the east population will make it very difficult. Do you think it would be possible to get reliable estimates of migration rates if the population sizes are very different? I'm also exploring mirgate-n which is already tricky with unphased SNP data.

Many thanks again for your help and insights. Best, Tom

Op ma 29 apr 2024 om 02:51 schreef Bruce Rannala @.***

:

Hi Tom, Sorry to be slow on this. I am exploring some analyses using small subsets of a few hundred loci to see whether convergence is an issue. So far I am seeing similar patterns. Is it possible that the two population sizes are very different? That will lead to very different proportions of migrants even if the individual migration rates are similar in both directions.

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2081730679, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXUI7LNBHWHSEODZK6CMHDY7WKSLAVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRG4ZTANRXHE . You are receiving this because you authored the thread.Message ID: @.***>

brannala commented 4 months ago

Hi Tom. No problem, sorry to be out of the loop -- I was travelling. It is actually census population size that matters for BA3 not effective population size so it is a bit difficult to interpret Ne or theta estimates in relation to this problem. Also, we have found migrate estimates can be unreliable in some circumstances, possibly due to a bug and it cannot handle many loci. Also, it expects sequences at each locus, not SNPs, so you need to fill in the invariant sites using the reference sequence; it can only handle a few hundred loci. If you have a reference genome and can generate sequences for each locus you could try using our program BPP which estimates migration rates and Ne (see https://www.pnas.org/doi/abs/10.1073/pnas.2310708120 ), it is able to handle a few thousand loci. If you need help generating loci from the VCF file and reference sequence I have a postdoc that could share her bash scripts with you -- she has been doing this for human data from the Thousand Genomes Project using bcftools. Programs like migrate and BPP estimate historical rates and Ne over the period since the populations shared a common ancestor which could be hundreds of thousands of years; the current population sizes could be very different. One thing I suggest is that you try running subsets of say 500 loci in BA3 to see how consistent estimates are. That number of loci should be quite informative and the program should converge more rapidly, there is really no need to use all your markers simultaneously. If different subsets converge to similar estimates that at least indicates that the difference in estimated migrant proportions you are seeing may be real. Bruce

On Tue, May 28, 2024 at 8:00 AM Tom Oosting @.***> wrote:

Dear Bruce,

Sorry to keep bothering you. I was hoping to get your insight on whether you think it's possible to obtain reliable migration rates from my populations using BA3 or perhaps any other program like migrate-N. To get a little more information on the difference in population size I ran NeEstimator. I did this based on your previous email, and I wanted to get a little more insight. Ne estimates from NeEstimator [image: image.png] I tried cutting corners by using a subset of loci, but ended up running my entire neutral dataset. The estimates using maf 0.05 will likely be most trustworthy which actually don't differ that much. But knowing this approach doesn't work well with species with large population sizes I don't know how useful these estimates are..

Would it be possible that, given genetic differentiation is low (FST = 0.0031), a higher migration rate in one direction would result in the appearance of many hybrys on the sink population? which is the case when I run BA3: Output BA3 0->East 1->West Migration Rates: m[0][0]: 0.9833(0.0063) m[0][1]: 0.0167(0.0063) m[1][0]: 0.3281(0.0217) m[1][1]: 0.6719(0.0217)

I've also done a number of runs in mirgate-N. This program doesn't really lend itself to WGS data but I found a way to generate reliable input. Output mirgate-N [image: image.png] The migration rates and genetic diversity were very similar between the two populations. So it could be that the effective population sizes are actually not that different (based on NeEstimator and mirate-N). But if both effective population sizes and migration rates are similar I cannot explain why BA3 is giving me a different answer.

I guess what I'm asking is whether you have any insights that help explain my results. I'll stop bothering you after email but I was hoping to get some final advice :).

Thank you again for your help. Cheers, Tom

Op ma 29 apr 2024 om 17:39 schreef Tom Oosting @.***>:

Dear Bruce,

No worries, I'm already very grateful you're willing to have a look at my dataset.

Yes, the East population will have a significantly larger population size. It's been hard to get reliable estimates on Ne because it's a fisheries species that is very abundant. It's regarding Australasian snapper in New Zealand, which has the largest inshore fisheries of the country. To give you an idea of the difference in abundance, the management areas that predominantly cover the East population have a TACC of 4,815 tonnes, while the management areas that cover the West population have TACC of 1584 tonnes. Also, genetic differentiation is very low between the populations, FST = 0.0031. There are two regions on opposite sides of the North Island of NZ where mixing between both populations is most likely occurring. I have a few admixed individuals and was also hoping for more information about the ancestry of the individuals using the -g flag.

Based on Ocean currents I'm expecting migration to be higher from East to West. But perhaps the larger population size from the east population will make it very difficult. Do you think it would be possible to get reliable estimates of migration rates if the population sizes are very different? I'm also exploring mirgate-n which is already tricky with unphased SNP data.

Many thanks again for your help and insights. Best, Tom

Op ma 29 apr 2024 om 02:51 schreef Bruce Rannala @.***

:

Hi Tom, Sorry to be slow on this. I am exploring some analyses using small subsets of a few hundred loci to see whether convergence is an issue. So far I am seeing similar patterns. Is it possible that the two population sizes are very different? That will lead to very different proportions of migrants even if the individual migration rates are similar in both directions.

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2081730679, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXUI7LNBHWHSEODZK6CMHDY7WKSLAVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRG4ZTANRXHE>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2135449461, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIMCMLHCSXADBAFAUWKGLTZESLX5AVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVGQ2DSNBWGE . You are receiving this because you commented.Message ID: @.***>

-- Bruce Rannala @.***

tomoosting commented 4 months ago

Dear Bruce,

Thank you for your reply, this already clarifies a number of things in my head. It makes sense that Nc has a much larger effect on migration than Ne. I'll get some runs going with a reduced dataset of 500 snps for BA3.

To run migrate-N, I used the linked sites option, where I created 100 loci, each with 200 (independently segregating) linked sites. I also randomly reassigned REF and ALT alleles for each individual so heterozygous genotypes wouldn't always segregate out the same way. I had spent some time trying to recreate sequences, but without having phasing information, I couldn't work out how to reconstruct the haplotypes.

Thank you for the suggestion! I'll start working on getting a run going on BPP. I would really appreciate it if your postdoc is willing to share her script to get the input file. How do you get around not knowing the phase of the genotypes? Or if you need phased data, would recommend randomly assigning REF and ALT alleles for closely linked sites to have some sort of a workaround? Perhaps this is not that much of an issue for species with large population sizes..

Many thanks again for your insights. Cheers, Tom

Op di 28 mei 2024 om 18:50 schreef Bruce Rannala @.***>:

Hi Tom. No problem, sorry to be out of the loop -- I was travelling. It is actually census population size that matters for BA3 not effective population size so it is a bit difficult to interpret Ne or theta estimates in relation to this problem. Also, we have found migrate estimates can be unreliable in some circumstances, possibly due to a bug and it cannot handle many loci. Also, it expects sequences at each locus, not SNPs, so you need to fill in the invariant sites using the reference sequence; it can only handle a few hundred loci. If you have a reference genome and can generate sequences for each locus you could try using our program BPP which estimates migration rates and Ne (see https://www.pnas.org/doi/abs/10.1073/pnas.2310708120 ), it is able to handle a few thousand loci. If you need help generating loci from the VCF file and reference sequence I have a postdoc that could share her bash scripts with you -- she has been doing this for human data from the Thousand Genomes Project using bcftools. Programs like migrate and BPP estimate historical rates and Ne over the period since the populations shared a common ancestor which could be hundreds of thousands of years; the current population sizes could be very different. One thing I suggest is that you try running subsets of say 500 loci in BA3 to see how consistent estimates are. That number of loci should be quite informative and the program should converge more rapidly, there is really no need to use all your markers simultaneously. If different subsets converge to similar estimates that at least indicates that the difference in estimated migrant proportions you are seeing may be real. Bruce

On Tue, May 28, 2024 at 8:00 AM Tom Oosting @.***> wrote:

Dear Bruce,

Sorry to keep bothering you. I was hoping to get your insight on whether you think it's possible to obtain reliable migration rates from my populations using BA3 or perhaps any other program like migrate-N. To get a little more information on the difference in population size I ran NeEstimator. I did this based on your previous email, and I wanted to get a little more insight. Ne estimates from NeEstimator [image: image.png] I tried cutting corners by using a subset of loci, but ended up running my entire neutral dataset. The estimates using maf 0.05 will likely be most trustworthy which actually don't differ that much. But knowing this approach doesn't work well with species with large population sizes I don't know how useful these estimates are..

Would it be possible that, given genetic differentiation is low (FST = 0.0031), a higher migration rate in one direction would result in the appearance of many hybrys on the sink population? which is the case when I run BA3: Output BA3 0->East 1->West Migration Rates: m[0][0]: 0.9833(0.0063) m[0][1]: 0.0167(0.0063) m[1][0]: 0.3281(0.0217) m[1][1]: 0.6719(0.0217)

I've also done a number of runs in mirgate-N. This program doesn't really lend itself to WGS data but I found a way to generate reliable input. Output mirgate-N [image: image.png] The migration rates and genetic diversity were very similar between the two populations. So it could be that the effective population sizes are actually not that different (based on NeEstimator and mirate-N). But if both effective population sizes and migration rates are similar I cannot explain why BA3 is giving me a different answer.

I guess what I'm asking is whether you have any insights that help explain my results. I'll stop bothering you after email but I was hoping to get some final advice :).

Thank you again for your help. Cheers, Tom

Op ma 29 apr 2024 om 17:39 schreef Tom Oosting @.***>:

Dear Bruce,

No worries, I'm already very grateful you're willing to have a look at my dataset.

Yes, the East population will have a significantly larger population size. It's been hard to get reliable estimates on Ne because it's a fisheries species that is very abundant. It's regarding Australasian snapper in New Zealand, which has the largest inshore fisheries of the country. To give you an idea of the difference in abundance, the management areas that predominantly cover the East population have a TACC of 4,815 tonnes, while the management areas that cover the West population have TACC of 1584 tonnes. Also, genetic differentiation is very low between the populations, FST

0.0031. There are two regions on opposite sides of the North Island of NZ where mixing between both populations is most likely occurring. I have a few admixed individuals and was also hoping for more information about the ancestry of the individuals using the -g flag.

Based on Ocean currents I'm expecting migration to be higher from East to West. But perhaps the larger population size from the east population will make it very difficult. Do you think it would be possible to get reliable estimates of migration rates if the population sizes are very different? I'm also exploring mirgate-n which is already tricky with unphased SNP data.

Many thanks again for your help and insights. Best, Tom

Op ma 29 apr 2024 om 02:51 schreef Bruce Rannala @.***

:

Hi Tom, Sorry to be slow on this. I am exploring some analyses using small subsets of a few hundred loci to see whether convergence is an issue. So far I am seeing similar patterns. Is it possible that the two population sizes are very different? That will lead to very different proportions of migrants even if the individual migration rates are similar in both directions.

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2081730679, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXUI7LNBHWHSEODZK6CMHDY7WKSLAVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRG4ZTANRXHE>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2135449461, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABIMCMLHCSXADBAFAUWKGLTZESLX5AVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVGQ2DSNBWGE>

. You are receiving this because you commented.Message ID: @.***>

-- Bruce Rannala @.***

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2135708897, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXUI7NHIC4YFVIHPVYRAWLZESYVFAVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVG4YDQOBZG4 . You are receiving this because you authored the thread.Message ID: @.***>

brannala commented 4 months ago

Hi Tom, BPP has an option for unphased data or you could phase each population using a program such as beagle. The most important thing is that you have a reference genome so that you can recreate sequences from the vcf file. I am CCing the postdoc in my lab Anna Nagel who could provide scripts for creating BPP input from your vcf file. She is teaching at a workshop in England right now so she may be a bit slow responding. Best wishes, Bruce

Bruce Rannala @.***

On Wed, May 29, 2024 at 1:35 PM Tom Oosting @.***> wrote:

Dear Bruce,

Thank you for your reply, this already clarifies a number of things in my head. It makes sense that Nc has a much larger effect on migration than Ne. I'll get some runs going with a reduced dataset of 500 snps for BA3.

To run migrate-N, I used the linked sites option, where I created 100 loci, each with 200 (independently segregating) linked sites. I also randomly reassigned REF and ALT alleles for each individual so heterozygous genotypes wouldn't always segregate out the same way. I had spent some time trying to recreate sequences, but without having phasing information, I couldn't work out how to reconstruct the haplotypes.

Thank you for the suggestion! I'll start working on getting a run going on BPP. I would really appreciate it if your postdoc is willing to share her script to get the input file. How do you get around not knowing the phase of the genotypes? Or if you need phased data, would recommend randomly assigning REF and ALT alleles for closely linked sites to have some sort of a workaround? Perhaps this is not that much of an issue for species with large population sizes..

Many thanks again for your insights. Cheers, Tom

Op di 28 mei 2024 om 18:50 schreef Bruce Rannala @.***>:

Hi Tom. No problem, sorry to be out of the loop -- I was travelling. It is actually census population size that matters for BA3 not effective population size so it is a bit difficult to interpret Ne or theta estimates in relation to this problem. Also, we have found migrate estimates can be unreliable in some circumstances, possibly due to a bug and it cannot handle many loci. Also, it expects sequences at each locus, not SNPs, so you need to fill in the invariant sites using the reference sequence; it can only handle a few hundred loci. If you have a reference genome and can generate sequences for each locus you could try using our program BPP which estimates migration rates and Ne (see https://www.pnas.org/doi/abs/10.1073/pnas.2310708120 ), it is able to handle a few thousand loci. If you need help generating loci from the VCF file and reference sequence I have a postdoc that could share her bash scripts with you -- she has been doing this for human data from the Thousand Genomes Project using bcftools. Programs like migrate and BPP estimate historical rates and Ne over the period since the populations shared a common ancestor which could be hundreds of thousands of years; the current population sizes could be very different. One thing I suggest is that you try running subsets of say 500 loci in BA3 to see how consistent estimates are. That number of loci should be quite informative and the program should converge more rapidly, there is really no need to use all your markers simultaneously. If different subsets converge to similar estimates that at least indicates that the difference in estimated migrant proportions you are seeing may be real. Bruce

On Tue, May 28, 2024 at 8:00 AM Tom Oosting @.***> wrote:

Dear Bruce,

Sorry to keep bothering you. I was hoping to get your insight on whether you think it's possible to obtain reliable migration rates from my populations using BA3 or perhaps any other program like migrate-N. To get a little more information on the difference in population size I ran NeEstimator. I did this based on your previous email, and I wanted to get a little more insight. Ne estimates from NeEstimator [image: image.png] I tried cutting corners by using a subset of loci, but ended up running my entire neutral dataset. The estimates using maf 0.05 will likely be most trustworthy which actually don't differ that much. But knowing this approach doesn't work well with species with large population sizes I don't know how useful these estimates are..

Would it be possible that, given genetic differentiation is low (FST = 0.0031), a higher migration rate in one direction would result in the appearance of many hybrys on the sink population? which is the case when I run BA3: Output BA3 0->East 1->West Migration Rates: m[0][0]: 0.9833(0.0063) m[0][1]: 0.0167(0.0063) m[1][0]: 0.3281(0.0217) m[1][1]: 0.6719(0.0217)

I've also done a number of runs in mirgate-N. This program doesn't really lend itself to WGS data but I found a way to generate reliable input. Output mirgate-N [image: image.png] The migration rates and genetic diversity were very similar between the two populations. So it could be that the effective population sizes are actually not that different (based on NeEstimator and mirate-N). But if both effective population sizes and migration rates are similar I cannot explain why BA3 is giving me a different answer.

I guess what I'm asking is whether you have any insights that help explain my results. I'll stop bothering you after email but I was hoping to get some final advice :).

Thank you again for your help. Cheers, Tom

Op ma 29 apr 2024 om 17:39 schreef Tom Oosting @.***>:

Dear Bruce,

No worries, I'm already very grateful you're willing to have a look at my dataset.

Yes, the East population will have a significantly larger population size. It's been hard to get reliable estimates on Ne because it's a fisheries species that is very abundant. It's regarding Australasian snapper in New Zealand, which has the largest inshore fisheries of the country. To give you an idea of the difference in abundance, the management areas that predominantly cover the East population have a TACC of 4,815 tonnes, while the management areas that cover the West population have TACC of 1584 tonnes. Also, genetic differentiation is very low between the populations, FST

0.0031. There are two regions on opposite sides of the North Island of NZ where mixing between both populations is most likely occurring. I have a few admixed individuals and was also hoping for more information about the ancestry of the individuals using the -g flag.

Based on Ocean currents I'm expecting migration to be higher from East to West. But perhaps the larger population size from the east population will make it very difficult. Do you think it would be possible to get reliable estimates of migration rates if the population sizes are very different? I'm also exploring mirgate-n which is already tricky with unphased SNP data.

Many thanks again for your help and insights. Best, Tom

Op ma 29 apr 2024 om 02:51 schreef Bruce Rannala @.***

:

Hi Tom, Sorry to be slow on this. I am exploring some analyses using small subsets of a few hundred loci to see whether convergence is an issue. So far I am seeing similar patterns. Is it possible that the two population sizes are very different? That will lead to very different proportions of migrants even if the individual migration rates are similar in both directions.

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2081730679,

or

unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXUI7LNBHWHSEODZK6CMHDY7WKSLAVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRG4ZTANRXHE>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2135449461, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ABIMCMLHCSXADBAFAUWKGLTZESLX5AVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVGQ2DSNBWGE>

. You are receiving this because you commented.Message ID: @.***>

-- Bruce Rannala @.***

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2135708897, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AJXUI7NHIC4YFVIHPVYRAWLZESYVFAVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVG4YDQOBZG4>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2138219436, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIMCMP6JUUMBRJU7LCTGFDZEY3X5AVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZYGIYTSNBTGY . You are receiving this because you commented.Message ID: @.***>

tomoosting commented 4 months ago

Amazing, thank you very much! I do have a reference genome so I should be good to try and run BBP.

Many thanks, Tom

Op wo 29 mei 2024 om 22:56 schreef Bruce Rannala @.***>:

Hi Tom, BPP has an option for unphased data or you could phase each population using a program such as beagle. The most important thing is that you have a reference genome so that you can recreate sequences from the vcf file. I am CCing the postdoc in my lab Anna Nagel who could provide scripts for creating BPP input from your vcf file. She is teaching at a workshop in England right now so she may be a bit slow responding. Best wishes, Bruce

Bruce Rannala @.***

On Wed, May 29, 2024 at 1:35 PM Tom Oosting @.***> wrote:

Dear Bruce,

Thank you for your reply, this already clarifies a number of things in my head. It makes sense that Nc has a much larger effect on migration than Ne. I'll get some runs going with a reduced dataset of 500 snps for BA3.

To run migrate-N, I used the linked sites option, where I created 100 loci, each with 200 (independently segregating) linked sites. I also randomly reassigned REF and ALT alleles for each individual so heterozygous genotypes wouldn't always segregate out the same way. I had spent some time trying to recreate sequences, but without having phasing information, I couldn't work out how to reconstruct the haplotypes.

Thank you for the suggestion! I'll start working on getting a run going on BPP. I would really appreciate it if your postdoc is willing to share her script to get the input file. How do you get around not knowing the phase of the genotypes? Or if you need phased data, would recommend randomly assigning REF and ALT alleles for closely linked sites to have some sort of a workaround? Perhaps this is not that much of an issue for species with large population sizes..

Many thanks again for your insights. Cheers, Tom

Op di 28 mei 2024 om 18:50 schreef Bruce Rannala @.***>:

Hi Tom. No problem, sorry to be out of the loop -- I was travelling. It is actually census population size that matters for BA3 not effective population size so it is a bit difficult to interpret Ne or theta estimates in relation to this problem. Also, we have found migrate estimates can be unreliable in some circumstances, possibly due to a bug and it cannot handle many loci. Also, it expects sequences at each locus, not SNPs, so you need to fill in the invariant sites using the reference sequence; it can only handle a few hundred loci. If you have a reference genome and can generate sequences for each locus you could try using our program BPP which estimates migration rates and Ne (see https://www.pnas.org/doi/abs/10.1073/pnas.2310708120 ), it is able to handle a few thousand loci. If you need help generating loci from the VCF file and reference sequence I have a postdoc that could share her bash scripts with you -- she has been doing this for human data from the Thousand Genomes Project using bcftools. Programs like migrate and BPP estimate historical rates and Ne over the period since the populations shared a common ancestor which could be hundreds of thousands of years; the current population sizes could be very different. One thing I suggest is that you try running subsets of say 500 loci in BA3 to see how consistent estimates are. That number of loci should be quite informative and the program should converge more rapidly, there is really no need to use all your markers simultaneously. If different subsets converge to similar estimates that at least indicates that the difference in estimated migrant proportions you are seeing may be real. Bruce

On Tue, May 28, 2024 at 8:00 AM Tom Oosting @.***> wrote:

Dear Bruce,

Sorry to keep bothering you. I was hoping to get your insight on whether you think it's possible to obtain reliable migration rates from my populations using BA3 or perhaps any other program like migrate-N. To get a little more information on the difference in population size I ran NeEstimator. I did this based on your previous email, and I wanted to get a little more insight. Ne estimates from NeEstimator [image: image.png] I tried cutting corners by using a subset of loci, but ended up running my entire neutral dataset. The estimates using maf 0.05 will likely be most trustworthy which actually don't differ that much. But knowing this approach doesn't work well with species with large population sizes I don't know how useful these estimates are..

Would it be possible that, given genetic differentiation is low (FST

0.0031), a higher migration rate in one direction would result in the appearance of many hybrys on the sink population? which is the case when I run BA3: Output BA3 0->East 1->West Migration Rates: m[0][0]: 0.9833(0.0063) m[0][1]: 0.0167(0.0063) m[1][0]: 0.3281(0.0217) m[1][1]: 0.6719(0.0217)

I've also done a number of runs in mirgate-N. This program doesn't really lend itself to WGS data but I found a way to generate reliable input. Output mirgate-N [image: image.png] The migration rates and genetic diversity were very similar between the two populations. So it could be that the effective population sizes are actually not that different (based on NeEstimator and mirate-N). But if both effective population sizes and migration rates are similar I cannot explain why BA3 is giving me a different answer.

I guess what I'm asking is whether you have any insights that help explain my results. I'll stop bothering you after email but I was hoping to get some final advice :).

Thank you again for your help. Cheers, Tom

Op ma 29 apr 2024 om 17:39 schreef Tom Oosting @.***>:

Dear Bruce,

No worries, I'm already very grateful you're willing to have a look at my dataset.

Yes, the East population will have a significantly larger population size. It's been hard to get reliable estimates on Ne because it's a fisheries species that is very abundant. It's regarding Australasian snapper in New Zealand, which has the largest inshore fisheries of the country. To give you an idea of the difference in abundance, the management areas that predominantly cover the East population have a TACC of 4,815 tonnes, while the management areas that cover the West population have TACC of 1584 tonnes. Also, genetic differentiation is very low between the populations, FST

0.0031. There are two regions on opposite sides of the North Island of NZ where mixing between both populations is most likely occurring. I have a few admixed individuals and was also hoping for more information about the ancestry of the individuals using the -g flag.

Based on Ocean currents I'm expecting migration to be higher from East to West. But perhaps the larger population size from the east population will make it very difficult. Do you think it would be possible to get reliable estimates of migration rates if the population sizes are very different? I'm also exploring mirgate-n which is already tricky with unphased SNP data.

Many thanks again for your help and insights. Best, Tom

Op ma 29 apr 2024 om 02:51 schreef Bruce Rannala @.***

:

Hi Tom, Sorry to be slow on this. I am exploring some analyses using small subsets of a few hundred loci to see whether convergence is an issue. So far I am seeing similar patterns. Is it possible that the two population sizes are very different? That will lead to very different proportions of migrants even if the individual migration rates are similar in both directions.

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2081730679,

or

unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXUI7LNBHWHSEODZK6CMHDY7WKSLAVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRG4ZTANRXHE>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2135449461, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/ABIMCMLHCSXADBAFAUWKGLTZESLX5AVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVGQ2DSNBWGE>

. You are receiving this because you commented.Message ID: @.***>

-- Bruce Rannala @.***

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2135708897, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AJXUI7NHIC4YFVIHPVYRAWLZESYVFAVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVG4YDQOBZG4>

. You are receiving this because you authored the thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2138219436, or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABIMCMP6JUUMBRJU7LCTGFDZEY3X5AVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZYGIYTSNBTGY>

. You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/brannala/BA3/issues/8#issuecomment-2138248860, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJXUI7JQIO72CLIEDZS2VUTZEY6IXAVCNFSM6AAAAABDBQEDCWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZYGI2DQOBWGA . You are receiving this because you authored the thread.Message ID: @.***>

brannala commented 3 months ago

Hi Tom, If you email me at brannnala@ucdavis.edu I will forward your email to Anna Nagel. She is happy to share her BPP formatting scripts with you. Bruce