Closed lcespedesarias closed 1 day ago
Hi Laura,
Hope that helps! -Gideon
Sorry - forgot to add that it's possible that the model is a poor fit to your system because the value of K is too high. Do you get sensical results for smaller values of K?
Hi Gideon, Thank you so much for your thorough answer! It was all very helpful.
Thank you so much for responding so thoroughly and kindly! I will try the things that I mentioned above for now, but if you have any additional comments/suggestions are the moment they are super welcome!
Thanks!
All the best,
Laura
I'm surprised that, even after collapsing individuals into populations, you still are losing ~75% of your SNPs! Although maybe if these are RADseq data and there's a lot of allelic dropout, that's not as surprising? I would definitely recommend dropping individuals with more missing data before scrapping the whole analysis. A thousand SNPs is still a lot of data! And, if you see geographic signal in a PCA or in pairwise Fst, I would guess that there's sufficient information in the data to parameterize the model. Does the inferred pattern of IBD look like it provides a reasonable fit to the data at K=1 for the spatial model?
Hi Gideon,
Thank you for your reply! Because of your comment, I realized that I am dropping a lot of SNPs before collapsing individuals into populations because I am using very strict missing data filters (in vcftools, before importing data to R). I think is a relict of previous runs when I was running analyses at the individual level. I will try calling SNPs again with less strict filters and will expect that I lose a lot less then since the no missing data filter would apply per population and not individual.
To answer your other question: the inferred pattern of IBD does looks like it provides a somewhat reasonable fit for K=1, but there is at least one marked discrete break in genetic structure (based on other analyses) so at a glance K=2 seems like it fits quite reasonably too. That is a very good point though, I have not tested K=1 in my runs yet, which I should do.
I will definitely keep you updated when the runs with (hopefully) more SNPs run!
Thanks so much!
Best,
Laura
Sounds good! Do you want to keep this issue open, or is it resolved?
Hi Gideon,
Yeah, I think it is resolved. Thanks so much for all your input!
Best,
Laura
Hello!
I am running ConStructor a dataset of approx. 212 individuals, and 4,906 SNPs. When I talk about "complex" genetic structure I mean that this is an complex of populations in which there is one stable hybrid zone in one area, deep genetic divergence associated with a geographic break in another, other areas of very low genetic structure, and so on. Is a widely distributed subspecies/species complex.
I have been able to run the analyses, but I am getting results that don't look very sensible, and for most K values this is (I think) due to the lack of convergence. I am running 4 chains of 1'000.000 steps, using average allele frequencies per locality. But that length of chains does not seem to be enough for convergence. I increased the number of steps to 2'000.000 but it has been running for a week and it seems like it will take a lot longer. If you have any advice on how to make runs more efficient, they are very much appreciated!
Something that I am thinking is that it maybe make sense to do analyses for different portions of the species complex separately. For example, there is a south and a north group that are quite distinct based on other analyses, and I could run independent construct analyses for each group. Would this make sense and potentially help to "simplify" the analysis so convergence can be achieved in a reasonable time?
I used the strategy of averaging allele frequencies per locality to help the running time issue. However, I also have doubts that this is a good idea given the existence of a hybrid zone within the complex. Which implies that individuals in the same place might have very different genetic backgrounds. Do you think in my case I should avoid working with this average and working with the individual values instead?
I am attaching the R script for the K=5 run, and the resulting plots for the spatial model. Any advice on any of this are very much appreciated!
All the best,
Laura
construct_complex_run_K5.R.zip complex_s_k5_pie.map.chain_1.pdf complex_s_k5_structure.plot.chain_1.pdf complex_s_k5_trace.plots.chain_1.pdf