Closed quinn-ca closed 1 year ago
Hi Quinn,
2a. I would probably stop at K=2 if the third layer at K=3 contributes such a small amount. 2b. For Pictures 1 and 2, which is spatial and which is nonspatial? 2c. Was this just supporting information for 2a&b, or is it its own question? 2d. If you matched layers across runs before plotting layer contributions, then layer contributions will be consistent across values of K.
3a. Yes, this is mostly useful for visualization. Sometimes it's neat to check whether the shape of the curves in different layers matches your expectations (like we saw in the conStruct paper comparing results for Populus trichocarpa and balsamifera).
3b. I believe that you're misreading the plot. I think the "outlier point" you're referring to is actually the diagonal of the sample allelic covariance matrix, which describes each samples' relatedness to itself, and which is automatically fixed at 0.25. Remember that each of the layer-specific covariance curves is being weighted by their admixture proportions in order to describe each observation of pairwise relatedness. That's why the "model.fit" plots (hopefully) look good even though the layer covariance curves might not model the data on their own particularly well.
3c. If your layer covariance curves are overlapping that indicates that each layer displays a similar rate of isolation by distance.
3d. As we note in the paper, there's a difference between statistical and biological significance. Your cross-validation results show strong and unambiguous support for the spatial K=3 model over all other models. But, maybe that statistical support comes at the expense of the biology, and the spatial K=2 model makes a lot more sense given what you know about your system. One note is that the IBD curves within each layer may be hard to see over the scale of the x-axis and y-axis at which you're viewing them. That is, there may be meaningful IBD within the red & blue layers, but you just can't see it on that particular plot.
3e. There's no way to edit the output of these graphs using the built-in functions in the R package, but you can certainly use the code as a starting point to make whatever plots you wish! If you want to see inside the code of a particular function in the conStruct package, you can either type it into your console without any arguments (e.g., make.all.the.plots
) or, if it's not an exported function, you can access it using the :::
syntax (e.g., conStruct:::plot.layer.covariances
).
You can load multiple R objects by loading one, assigning it to a new variable, then loading another. E.g.
load("K1_sp_conStruct.results.Robj")
k1out <- conStruct.results
load("K2_sp_conStruct.results.Robj")
k2out <- conStruct.results
load("K2_sp_conStruct.results.Robj")
# etc.
Yes, the make.admix.pie.plot
function (as called from inside make.all.the.plots
) visualizes the MAP admixture proportions for each chain. If you want to look at heterogeneity in the MAP estimates across chains, you can make admixture pie plots (or structure-style plots) for each independent chain.
hope that helps!
Thank you for your helpful answers and clarifications!
The non-spatial results are most similar to what I have found with PCA and pairwise-Fst analyses. The southwest site clustered by itself in the PCA, with other sites overlapping. Pairwise Fst showed the biggest differences between the southwest site and all others; pairwise Fst between the two eastern sites was not significant. Notably, the two northwest sites were combined in pairwise Fst estimates because their combined sample size is 19 individuals.
3b. Your answer about the "outlier" being the diagonal in the sample allelic covariance matrix makes sense and is what I started to uncover when I looked at the matrix itself, confirming this is very helpful, thanks! The model fit plots seem to look good for the spatial model at K = 2 (Picture 10), although there may be some deviations at K = 3 (Picture 11). The model fit of the pruned dataset may be inappropriate at K = 2 (Picture 12), but particularly at K = 3 (Picture 13). For comparison, I’ve also included model fit graphs for the full-dataset, non-spatial model (K = 2; Picture 14). To confirm, I’m assessing these graphs to ensure there aren’t patterns across the MCMC chains, correct? Are there other deviations in the model fit plots I should be assessing?
I really appreciate your help! I’m meeting with my committee member soon to discuss my results, but they aren’t familiar with this program, so I want to make sure I have a solid interpretation to build our discussion on.
Quinn
Picture 5
Picture 6
Picture 7
Picture 8
Picture 9
Picture 10
Picture 11
Picture 12
Picture 13
Picture 14
To confirm, I’m assessing these graphs to ensure there aren’t patterns across the MCMC chains, correct? Are there other deviations in the model fit plots I should be assessing?
The "model fit" plots are specifically the "Cov/geoDist" plots (in the lower RH corner of several of the uploaded figures). The other plots output for MCMC diagnosis show parameter trace plots (estimates of the parameter value over the duration of the MCMC run). Ideally, the "model fit" plots show that the model is able to generate a parametric covariance matrix that is similar to (and captures the major trends of) the empirical data. And the traceplots should look like "fuzzy caterpillars" that show consistent results across independent runs of the same model (i.e., their converging on the same stationary distribution).
Hi Quinn,
Just doing some book-keeping here. Are you ok to close this issue?
Hi Quinn,
I'm going to close the issue, but feel free to re-open if you have further questions/concerns on this point.
@gbradburd Hello, are independent runs of the same model called "chains" or "replicates" ?
In the main construct function, there is option for running # of chains, but not replicates. Whereas, In the x.validation function, there is only option for running # of replicates, but not # of chains.
Are chain and replicate the same thing? If not, how are they different? And which should I use to assess what you described in the vegnette: "
Ideally, multiple independent runs converge on the same stationary distribution, with similar parameter estimates and posterior probabilities. "
Best, Dahn-young
Hi Dahn-young,
chains
and replicates
are different things in this context. A chain
here is being used as defined in rstan
- as an independent MCMC analysis. A replicate
is a cross-validation replicate. The cross-validation procedure is discussed in depth in the "Cross-validation Procedure" section of the conStruct paper appendix.
If that's the section of the vignette you're reading, it sounds like you want to run multiple independent chains for a particular conStruct analysis.
Hope that helps! -gideon
Hello again, thank you/apologies in advance for the number of questions. I greatly appreciate any guidance you can provide!
Layer contributions a. When K = 3, the third layer has an extremely small contribution (~0.1%). However, at K = 4, the fourth layer contributes moderately. Is this worth considering or should I ‘stop’ at the first value of K where layers begin to contribute marginally? (see Picture 1) b. I’m seeing increasing subdivision among layers at increasing values of K in my spatial model, but less so in the non-spatial
model. How might I interpret this? The paper addressed spurious clusters in non-spatial models because it’s trying to partition clinal differences into discrete groups, but I’m less sure how to interpret the opposite case, which I seem to have (see Pictures 1 and 2) c. I reviewed the phi values for the spatial models at K = 3 and K = 4, based on an earlier question (
conStruct.results$chain_1$MAP$layer.params$layer_k
) (https://github.com/gbradburd/conStruct/issues/48) ------> Phi values for K = 3 spatial model: K = 1: 2.41 x e-4, K = 2: 7.53 x e-5, K = 3: 5.07 x e-4 ------> Phi values for K = 4 spatial model: K = 1: 1.64 x e-5, K = 2: 4.09 x e-5, K = 3: 6.58 x e-5, K = 4: 9.36 x e-5d. Are the individual layers consistent across values of K for a specific model? ------> For example, the contribution of layer 3 (when K = 3 and K = 4) is ~0.1%, while layer 4 contributes 27% when K = 4 ------> I used the function
match.layers.x.runs
prior to plotting the layer contributions, by doing so, I take it that, yes, layer 3 is the same across all values of K (for which it was assessed)conStruct.results$chain_1$MAP$par.cov
), and each sample, when compared with itself, had a value approximately at this value; I’m not sure if this is what is being graphed, however c. If my layer covariance curves are mostly overlapping, how should I interpret this (Picture 3; layers 1 and 2)? d. Additionally, although the CV predictive accuracy of the spatial model outperformed the non-spatial model (indicative of IBD; Picture 4), this was marginal at K = 2. I don’t see a decay in correlation with distance in the layer covariance curves (Picture 3). Is this indicative of IBD in my system, then? e. Can I edit the output of the graphs? In the paper, for example, the data points on the layer covariance curves are color- codedPlease let me know if you need any clarification on what I've asked above, and thanks again. I've also attached these questions in a Word document in case the formatting doesn't transfer well.
Questions_conStruct_Results.docx
Picture1.pdf Picture2.pdf Picture3.pdf Picture4.pdf