Closed JimStarrett closed 1 month ago
Hi, can you please supply your data, IMAP, and contro file? It is very difficult to say anything definitive without being able to reproduce your analysis.
As a debug step, I recommend you run the program in gdi estimation mode [where all merge or split proposals are accepted] by setting gdi_threshold = None
in the control file, and seeing the resulting gdi values.
Thanks for the files, I modified your control file to lower the execution time, first by manually providing tau and theta priors (this sidesteps the automatic inference of these priors, which is quite costly on such a large alignment file), and also by lowering the number of loci being used in the calculations to 8. I also lowered the number of samples, and removed the lines corresponding to migration, since these were commented out by you.
I set the thresholds such that all splits will be accepted just to see what the ballpark gdi values are.
# output folder
output_directory = results
# input files
Imapfile = Aty_hhsd_60lump_IMAP.txt
seqfile = Atypoides_hhsd_935p_phased_renamed.phy
# guide tree
guide_tree = (((((G,H),(I,J)),(K,(M,L))),(F,(E,D))),(A,(B,C)));
# hierarchical algorithm settings
mode = split
gdi_threshold = >=0.0, >=0.0
# BPP MCMC settings
threads = 8
burnin = 500
nsample = 1000
nloci = 8
tauprior = 3 0.01
thetaprior = 3 0.01 e
If you run with this control file, it will give you gdi values for all groups eventually, and you can check if the rest of the values make sense.
I got the following results for the first split:
< Analysis started >
> Starting state of split analysis
Number of species in starting delimitation: 1
'GHIJKMLFEDABC'
--GHIJKMLFEDABC
*** Iteration 1 ***
> Estimated tau and theta parameters:
theta 2.5% HPD 97.5% HPD tau 2.5% HPD 97.5% HPD
ABC 0.025282 0.022110 0.029684
GHIJKMLFED 0.102492 0.093182 0.110447
GHIJKMLFEDABC 0.018354 0.012818 0.026323 0.010411 0.009639 0.011213
> Proposal results:
node 1 GDI 1 2.5% HPD 97.5% HPD node 2 GDI 2 2.5% HPD 97.5% HPD split accepted?
GHIJKMLFED 0.18 0.17 0.2 ABC 0.56 0.51 0.63 True
Number of species after iteration 1: 2
'GHIJKMLFED', 'ABC'
/-GHIJKMLFED
--|
\-ABC
...
The gdi for population A in the case two populations A and B with no migration is calculated using the following formula:
$$ gdi = 1 - e^{-2\tau_{AB} / \theta_A} $$
so low gdi values at the first split could be due to either the tau or theta parameters. Do you have any idea if the inferred values match previous literature results about the age and genetic diversity of the group you are working on?
In fact, if you have some literature derived ballpark values for these figures, feel free to set them manually as priors. The manual has a section on how to do this.
Finally, the "reference" gdi thresholds given in previous papers or our paper are somewhat arbitrary (even though they were calibrated on a significant number of cases). It may be the case that they are not appropriate for the spiders you are working on. Perhaps try the program on some closely related groups where species status is known with high confidence, and check if the inferred gdi values match those biologically grounded expectations.
However, since there does not seem to be an issue related to the hhsd software itself, I will close this thread.
Hello, I'm trying to understand some preliminary results with my dataset where the merge analysis is retaining 13 species (ie not merging any groups), but the split analysis is not accepting even the deepest split in the tree. The dataset is for a highly structured species complex, so no surprise about not merging any groups, but I would expect the lower bound on number of species to be more than 1. The data has >100 phased loci for >100 individuals. Post-burnin values look reasonable in Tracer.
It's not clear to me what about the data is influencing the gdis for the initial split to not meet the threshold. Any suggestions of what to look for in the results files?
thank you