Closed nicolashazzi23 closed 1 day ago
Hi, sorry for the issue. Can you provide your control file, Imap file, and sequence data?
I seem to have this same error message when attempting to test for migration between non-sister taxa, or across clade groups.
I seem to have this same error message when attempting to test for migration between non-sister taxa, or across clade groups.
Please supply your sequence data, control file, and Imap file, so we can reproduce the issue.
@JimStarrett provided some sample files in a comment this other issue (https://github.com/abacus-gene/hhsd/issues/11#issuecomment-2241326549).
I ran a slightly modified control file that shortens the runtime (by reducing the number of samples, and the number of loci used in the calculations, as well as manually setting tau and theta priors to avoid their automatic inference). The acceptance thresholds are also set in a way where all modifications will be accepted. This is mainly useful to check if the program crashes at any point in the delimitation procedure.
# output folder
output_directory = res_m
# input files
Imapfile = Aty_hhsd_60lump_IMAP.txt
seqfile = Atypoides_hhsd_935p_phased_renamed.phy
# guide tree
guide_tree = (((((G,H),(I,J)),(K,(M,L))),(F,(E,D))),(A,(B,C)));
# migration
migration = {A <-> K}
# migration prior
migprior = 2 20
# hierarchical algorithm settings
mode = merge
gdi_threshold = <=1.0, <=1.0
# BPP MCMC settings
threads = 8
burnin = 500
nsample = 1000
nloci = 8
tauprior = 3 0.01
thetaprior = 3 0.01 e
In this control file, the A and K groups, between which migration should be estimated, are not sisters.
Interestingly, the program seems to run for a couple iterations, but crashes at the point where we have gone down to three species:
< Analysis started >
> Starting state of merge analysis
Number of species in starting delimitation: 13
'G', 'H', 'I', 'J', 'K', 'M', 'L', 'F', 'E', 'D', 'A', 'B', 'C'
/-G
/-|
| \-H
/-|
| | /-I
| \-|
/-| \-J
| |
| | /-K
| \-|
/-| | /-M
| | \-|
| | \-L
| |
| | /-F
--| \-|
| | /-E
| \-|
| \-D
|
| /-A
\-|
| /-B
\-|
\-C
*** Iteration 1 ***
> Estimated tau and theta parameters:
theta 2.5% HPD 97.5% HPD tau 2.5% HPD 97.5% HPD
A 0.019631 0.014933 0.023459
F 0.012701 0.010541 0.015283
B 0.006156 0.004449 0.008592
C 0.003362 0.002219 0.004725
K 0.003785 0.002196 0.006243
E 0.005623 0.004123 0.007226
D 0.005038 0.003299 0.007058
G 0.006727 0.004292 0.008853
H 0.010986 0.008692 0.013645
I 0.002549 0.001584 0.003504
J 0.013883 0.009611 0.017134
M 0.023535 0.019284 0.027940
L 0.021137 0.016125 0.024779
GHIJKMLFEDABC 0.026594 0.015709 0.037058 0.009110 0.008461 0.009823
GHIJKMLFED 0.005207 0.001370 0.011388 0.008784 0.008124 0.009648
GHIJKML 0.002949 0.001312 0.004805 0.007470 0.006875 0.008072
GHIJ 0.004019 0.001206 0.007819 0.006842 0.006254 0.007531
GH 0.003661 0.001301 0.007815 0.005736 0.004888 0.006576
IJ 0.002816 0.001221 0.004703 0.004288 0.003402 0.005069
KML 0.005265 0.001526 0.011869 0.007136 0.006396 0.007818
ML 0.003623 0.001746 0.006083 0.005493 0.004730 0.006322
FED 0.007641 0.001909 0.015944 0.007622 0.006408 0.009094
ED 0.007927 0.003280 0.012951 0.003397 0.002486 0.004603
ABC 0.003319 0.001613 0.004924 0.005775 0.004941 0.006624
BC 0.004173 0.001544 0.008122 0.004244 0.002644 0.005407
> Estimated migration rates:
source destination M 2.5% HPD 97.5% HPD
A K 0.020387 0.000424 0.050384
K A 0.026029 0.001426 0.063409
> Proposal results:
node 1 GDI 1 2.5% HPD 97.5% HPD node 2 GDI 2 2.5% HPD 97.5% HPD merge accepted?
B 0.75 0.60 0.88 C 0.91 0.82 0.98 True
E 0.70 0.53 0.83 D 0.74 0.60 0.90 True
G 0.82 0.70 0.92 H 0.65 0.56 0.75 True
I 0.96 0.90 0.99 J 0.46 0.35 0.57 True
M 0.38 0.31 0.46 L 0.41 0.35 0.49 True
Number of species after iteration 1: 8
'GH', 'IJ', 'K', 'ML', 'F', 'ED', 'A', 'BC'
/-GH
/-|
| \-IJ
/-|
| | /-K
| \-|
/-| \-ML
| |
| | /-F
--| \-|
| \-ED
|
| /-A
\-|
\-BC
*** Iteration 2 ***
> Estimated tau and theta parameters:
theta 2.5% HPD 97.5% HPD tau 2.5% HPD 97.5% HPD
A 0.019423 0.014846 0.024536
F 0.012503 0.009792 0.014723
BC 0.009445 0.007083 0.011587
K 0.003554 0.001857 0.005175
ED 0.011261 0.008951 0.013957
GH 0.018517 0.014809 0.022038
IJ 0.012985 0.010811 0.015860
ML 0.047868 0.040691 0.055701
GHIJKMLFEDABC 0.024941 0.016036 0.035394 0.009370 0.008726 0.009999
GHIJKMLFED 0.006192 0.000984 0.015912 0.009056 0.008423 0.009740
GHIJKML 0.003040 0.001391 0.004958 0.007464 0.006817 0.008069
GHIJ 0.003262 0.001336 0.005345 0.006531 0.005751 0.007224
KML 0.008480 0.004010 0.013369 0.004713 0.003427 0.005798
FED 0.008241 0.001851 0.020635 0.008086 0.007063 0.009031
ABC 0.003402 0.001540 0.005335 0.006047 0.005219 0.007100
> Estimated migration rates:
source destination M 2.5% HPD 97.5% HPD
A K 0.027485 0.001243 0.068456
K A 0.025404 0.001105 0.060067
inferring gdi for 'ML' and sister 'K' using gene tree simulation...
> Proposal results:
node 1 GDI 1 2.5% HPD 97.5% HPD node 2 GDI 2 2.5% HPD 97.5% HPD merge accepted?
A 0.43 0.35 0.53 BC 0.71 0.60 0.80 True
F 0.73 0.65 0.81 ED 0.76 0.68 0.85 True
GH 0.51 0.43 0.60 IJ 0.64 0.56 0.70 True
K 0.86 0.71 0.95 ML 0.17 0.11 0.22 True
Number of species after iteration 2: 4
'GHIJ', 'KML', 'FED', 'ABC'
/-GHIJ
/-|
/-| \-KML
| |
--| \-FED
|
\-ABC
*** Iteration 3 ***
> Estimated tau and theta parameters:
theta 2.5% HPD 97.5% HPD tau 2.5% HPD 97.5% HPD
ABC 0.024211 0.020701 0.029189
FED 0.024968 0.020960 0.027978
KML 0.048020 0.042756 0.053195
GHIJ 0.032583 0.028009 0.036636
GHIJKMLFEDABC 0.028708 0.016694 0.040018 0.009920 0.008810 0.011230
GHIJKMLFED 0.004528 0.001216 0.008639 0.009515 0.008454 0.010673
GHIJKML 0.003111 0.001728 0.004438 0.006884 0.006244 0.007419
> Estimated migration rates:
source destination M 2.5% HPD 97.5% HPD
ABC KML 0.036907 0.004886 0.078463
KML ABC 0.020458 0.001249 0.048343
inferring gdi for 'KML' and sister 'GHIJ' using gene tree simulation...
> Proposal results:
node 1 GDI 1 2.5% HPD 97.5% HPD node 2 GDI 2 2.5% HPD 97.5% HPD merge accepted?
GHIJ 0.34 0.29 0.39 KML 0.24 0.2 0.27 True
Number of species after iteration 3: 3
'GHIJKML', 'FED', 'ABC'
/-GHIJKML
/-|
--| \-FED
|
\-ABC
*** Iteration 4 ***
Traceback (most recent call last):
File "/home/daniel/anaconda3/bin/hhsd", line 8, in <module>
sys.exit(run())
File "/home/daniel/anaconda3/lib/python3.9/site-packages/hhsd/hhsd.py", line 57, in run
hhsd(cf_path, cf_override)
File "/home/daniel/anaconda3/lib/python3.9/site-packages/hhsd/hhsd.py", line 46, in hhsd
tree = HA_iteration(tree, bpp_ctl, cf)
File "/home/daniel/anaconda3/lib/python3.9/site-packages/hhsd/module_HA.py", line 177, in HA_iteration
estimated_param = MSCNumericParamEstimates(BPP_outfile="proposal_bpp_out.txt", BPP_mcmcfile="proposal_bpp_mcmc.txt")
File "/home/daniel/anaconda3/lib/python3.9/site-packages/hhsd/module_bpp_readres.py", line 322, in __init__
map_dict_long, map_dict_numeric = get_node_number_map(BPP_outfile)
File "/home/daniel/anaconda3/lib/python3.9/site-packages/hhsd/module_bpp_readres.py", line 122, in get_node_number_map
relevant_index = lines.index("List of nodes, taus and thetas:") # find the line where the node labels are listed
ValueError: 'List of nodes, taus and thetas:' is not in list
I will investigate the source of this error.
Thank you for looking into this matter and your help with my other questions.
The issue was related to a segmentation fault in the BPP 4.6.2 executable bundled with the program! There are two fixes:
I changed the run_bpp_A00
function within the program to monitor seperately for errors in BPP. This should mean that potential issues with BPP make hhsd
fail more gracefully.
# run BPP with a given control file, and capture the stdout results
def run_BPP_A00(
control_file: BppCfile,
) -> None: # handles the bpp subprocess, which outputs a file
'''
Handles the starting and stopping of the C program BPP, which is used to infer MSC parameters.
'''
bpp_completed = False
while not bpp_completed:
# flag activated when numeric scaling is turned on
restart = False
# runs BPP in a dedicated subprocess
process = subprocess.Popen(
f"{get_bundled_bpp_path()} --cfile {control_file}",
shell = True,
bufsize = 1,
stdout = subprocess.PIPE,
stderr = subprocess.STDOUT,
encoding = 'utf-8',
errors = 'replace'
)
# monitors the output of the process
while True:
output_line = process.stdout.readline()
# if the output is non-empty
if output_line:
# check that numeric scaling is needed, and activate if it is. This will restart bpp with the new control file
# numeric scaling is not active by default because it slows bpp considerably.
if "[ERROR] log-L for locus" in output_line:
print('Restarting BPP with numerical scaling')
file1 = open(control_file, "a") # append mode
file1.write("scaling=1")
file1.close()
restart = True
break
# check if bpp errored with a given error message
elif "[ERROR]" in output_line:
print("#", output_line)
sys.exit(f"BppError: BPP failed. Check control file independently using the 'bpp' command")
# check if bpp gave a segfault
elif "core dumped" in output_line:
print("#", output_line)
sys.exit("BppError: BPP failed with segfault. Check control file independently using the 'bpp' command, and contact BPP developers if issue persists")
# print the current progress indicator
progress = re.findall("-*\d\d%", output_line)
if len(progress) == 1:
print(f'BPP progress: {progress[0]} ', end='\r')
# exit if process has stopped
if (output_line == '') and (process.poll() != None):
break
if restart == False:
process.wait() # check again that process has stopped
print("> Finished BPP run ")
bpp_completed = True
Let me know if there are any more issues.
Hi, I am trying to run a merge analysis, but I am getting this error (see image attached). I have done this analysis with succes using other data, but I don't know what is happening with this new data. I will really appreciate any help you can provide me