abacus-gene / hhsd

HHSD is a python pipeline for hierarchical heuristic species delimitation using genomic sequence data under the multispecies coalescent model. It drives data analysis using the program bpp.
GNU General Public License v3.0
2 stars 0 forks source link

List of nodes, taus and thetas #6

Closed nicolashazzi23 closed 1 day ago

nicolashazzi23 commented 1 month ago

Hi, I am trying to run a merge analysis, but I am getting this error (see image attached). I have done this analysis with succes using other data, but I don't know what is happening with this new data. I will really appreciate any help you can provide me

hhds

dkornai commented 1 month ago

Hi, sorry for the issue. Can you provide your control file, Imap file, and sequence data?

JimStarrett commented 2 weeks ago

I seem to have this same error message when attempting to test for migration between non-sister taxa, or across clade groups.

dkornai commented 2 weeks ago

I seem to have this same error message when attempting to test for migration between non-sister taxa, or across clade groups.

Please supply your sequence data, control file, and Imap file, so we can reproduce the issue.

dkornai commented 3 days ago

@JimStarrett provided some sample files in a comment this other issue (https://github.com/abacus-gene/hhsd/issues/11#issuecomment-2241326549).

I ran a slightly modified control file that shortens the runtime (by reducing the number of samples, and the number of loci used in the calculations, as well as manually setting tau and theta priors to avoid their automatic inference). The acceptance thresholds are also set in a way where all modifications will be accepted. This is mainly useful to check if the program crashes at any point in the delimitation procedure.

# output folder
output_directory = res_m

# input files
Imapfile = Aty_hhsd_60lump_IMAP.txt
seqfile  = Atypoides_hhsd_935p_phased_renamed.phy

# guide tree
guide_tree = (((((G,H),(I,J)),(K,(M,L))),(F,(E,D))),(A,(B,C)));

# migration
migration = {A <-> K}

# migration prior
migprior = 2 20

# hierarchical algorithm settings
mode = merge
gdi_threshold = <=1.0, <=1.0

# BPP MCMC settings
threads = 8
burnin =  500
nsample = 1000
nloci = 8
tauprior = 3 0.01
thetaprior = 3 0.01 e

In this control file, the A and K groups, between which migration should be estimated, are not sisters.

Interestingly, the program seems to run for a couple iterations, but crashes at the point where we have gone down to three species:

< Analysis started >

> Starting state of merge analysis

Number of species in starting delimitation:  13
'G', 'H', 'I', 'J', 'K', 'M', 'L', 'F', 'E', 'D', 'A', 'B', 'C'

               /-G
            /-|
           |   \-H
         /-|
        |  |   /-I
        |   \-|
      /-|      \-J
     |  |
     |  |   /-K
     |   \-|
   /-|     |   /-M
  |  |      \-|
  |  |         \-L
  |  |
  |  |   /-F
--|   \-|
  |     |   /-E
  |      \-|
  |         \-D
  |
  |   /-A
   \-|
     |   /-B
      \-|
         \-C

*** Iteration 1 ***

> Estimated tau and theta parameters:

                  theta  2.5% HPD  97.5% HPD       tau  2.5% HPD  97.5% HPD
A              0.019631  0.014933   0.023459                               
F              0.012701  0.010541   0.015283                               
B              0.006156  0.004449   0.008592                               
C              0.003362  0.002219   0.004725                               
K              0.003785  0.002196   0.006243                               
E              0.005623  0.004123   0.007226                               
D              0.005038  0.003299   0.007058                               
G              0.006727  0.004292   0.008853                               
H              0.010986  0.008692   0.013645                               
I              0.002549  0.001584   0.003504                               
J              0.013883  0.009611   0.017134                               
M              0.023535  0.019284   0.027940                               
L              0.021137  0.016125   0.024779                               
GHIJKMLFEDABC  0.026594  0.015709   0.037058  0.009110  0.008461   0.009823
GHIJKMLFED     0.005207  0.001370   0.011388  0.008784  0.008124   0.009648
GHIJKML        0.002949  0.001312   0.004805  0.007470  0.006875   0.008072
GHIJ           0.004019  0.001206   0.007819  0.006842  0.006254   0.007531
GH             0.003661  0.001301   0.007815  0.005736  0.004888   0.006576
IJ             0.002816  0.001221   0.004703  0.004288  0.003402   0.005069
KML            0.005265  0.001526   0.011869  0.007136  0.006396   0.007818
ML             0.003623  0.001746   0.006083  0.005493  0.004730   0.006322
FED            0.007641  0.001909   0.015944  0.007622  0.006408   0.009094
ED             0.007927  0.003280   0.012951  0.003397  0.002486   0.004603
ABC            0.003319  0.001613   0.004924  0.005775  0.004941   0.006624
BC             0.004173  0.001544   0.008122  0.004244  0.002644   0.005407

> Estimated migration rates:

source destination        M  2.5% HPD  97.5% HPD
     A           K 0.020387  0.000424   0.050384
     K           A 0.026029  0.001426   0.063409

> Proposal results:

node 1  GDI 1  2.5% HPD  97.5% HPD node 2  GDI 2  2.5% HPD  97.5% HPD  merge accepted?
     B   0.75      0.60       0.88      C   0.91      0.82       0.98             True
     E   0.70      0.53       0.83      D   0.74      0.60       0.90             True
     G   0.82      0.70       0.92      H   0.65      0.56       0.75             True
     I   0.96      0.90       0.99      J   0.46      0.35       0.57             True
     M   0.38      0.31       0.46      L   0.41      0.35       0.49             True

Number of species after iteration 1:  8
'GH', 'IJ', 'K', 'ML', 'F', 'ED', 'A', 'BC'

            /-GH
         /-|
        |   \-IJ
      /-|
     |  |   /-K
     |   \-|
   /-|      \-ML
  |  |
  |  |   /-F
--|   \-|
  |      \-ED
  |
  |   /-A
   \-|
      \-BC

*** Iteration 2 ***

> Estimated tau and theta parameters:

                  theta  2.5% HPD  97.5% HPD       tau  2.5% HPD  97.5% HPD
A              0.019423  0.014846   0.024536                               
F              0.012503  0.009792   0.014723                               
BC             0.009445  0.007083   0.011587                               
K              0.003554  0.001857   0.005175                               
ED             0.011261  0.008951   0.013957                               
GH             0.018517  0.014809   0.022038                               
IJ             0.012985  0.010811   0.015860                               
ML             0.047868  0.040691   0.055701                               
GHIJKMLFEDABC  0.024941  0.016036   0.035394  0.009370  0.008726   0.009999
GHIJKMLFED     0.006192  0.000984   0.015912  0.009056  0.008423   0.009740
GHIJKML        0.003040  0.001391   0.004958  0.007464  0.006817   0.008069
GHIJ           0.003262  0.001336   0.005345  0.006531  0.005751   0.007224
KML            0.008480  0.004010   0.013369  0.004713  0.003427   0.005798
FED            0.008241  0.001851   0.020635  0.008086  0.007063   0.009031
ABC            0.003402  0.001540   0.005335  0.006047  0.005219   0.007100

> Estimated migration rates:

source destination        M  2.5% HPD  97.5% HPD
     A           K 0.027485  0.001243   0.068456
     K           A 0.025404  0.001105   0.060067
inferring gdi for 'ML' and sister 'K' using gene tree simulation...                                    
> Proposal results:

node 1  GDI 1  2.5% HPD  97.5% HPD node 2  GDI 2  2.5% HPD  97.5% HPD  merge accepted?
     A   0.43      0.35       0.53     BC   0.71      0.60       0.80             True
     F   0.73      0.65       0.81     ED   0.76      0.68       0.85             True
    GH   0.51      0.43       0.60     IJ   0.64      0.56       0.70             True
     K   0.86      0.71       0.95     ML   0.17      0.11       0.22             True

Number of species after iteration 2:  4
'GHIJ', 'KML', 'FED', 'ABC'

         /-GHIJ
      /-|
   /-|   \-KML
  |  |
--|   \-FED
  |
   \-ABC

*** Iteration 3 ***

> Estimated tau and theta parameters:

                  theta  2.5% HPD  97.5% HPD       tau  2.5% HPD  97.5% HPD
ABC            0.024211  0.020701   0.029189                               
FED            0.024968  0.020960   0.027978                               
KML            0.048020  0.042756   0.053195                               
GHIJ           0.032583  0.028009   0.036636                               
GHIJKMLFEDABC  0.028708  0.016694   0.040018  0.009920  0.008810   0.011230
GHIJKMLFED     0.004528  0.001216   0.008639  0.009515  0.008454   0.010673
GHIJKML        0.003111  0.001728   0.004438  0.006884  0.006244   0.007419

> Estimated migration rates:

source destination        M  2.5% HPD  97.5% HPD
   ABC         KML 0.036907  0.004886   0.078463
   KML         ABC 0.020458  0.001249   0.048343
inferring gdi for 'KML' and sister 'GHIJ' using gene tree simulation...                                    
> Proposal results:

node 1  GDI 1  2.5% HPD  97.5% HPD node 2  GDI 2  2.5% HPD  97.5% HPD  merge accepted?
  GHIJ   0.34      0.29       0.39    KML   0.24       0.2       0.27             True

Number of species after iteration 3:  3
'GHIJKML', 'FED', 'ABC'

      /-GHIJKML
   /-|
--|   \-FED
  |
   \-ABC

*** Iteration 4 ***
Traceback (most recent call last):                         
  File "/home/daniel/anaconda3/bin/hhsd", line 8, in <module>
    sys.exit(run())
  File "/home/daniel/anaconda3/lib/python3.9/site-packages/hhsd/hhsd.py", line 57, in run
    hhsd(cf_path, cf_override)
  File "/home/daniel/anaconda3/lib/python3.9/site-packages/hhsd/hhsd.py", line 46, in hhsd
    tree = HA_iteration(tree, bpp_ctl, cf)
  File "/home/daniel/anaconda3/lib/python3.9/site-packages/hhsd/module_HA.py", line 177, in HA_iteration
    estimated_param = MSCNumericParamEstimates(BPP_outfile="proposal_bpp_out.txt", BPP_mcmcfile="proposal_bpp_mcmc.txt")
  File "/home/daniel/anaconda3/lib/python3.9/site-packages/hhsd/module_bpp_readres.py", line 322, in __init__
    map_dict_long, map_dict_numeric = get_node_number_map(BPP_outfile)
  File "/home/daniel/anaconda3/lib/python3.9/site-packages/hhsd/module_bpp_readres.py", line 122, in get_node_number_map
    relevant_index = lines.index("List of nodes, taus and thetas:") # find the line where the node labels are listed
ValueError: 'List of nodes, taus and thetas:' is not in list

I will investigate the source of this error.

JimStarrett commented 2 days ago

Thank you for looking into this matter and your help with my other questions.

dkornai commented 1 day ago

The issue was related to a segmentation fault in the BPP 4.6.2 executable bundled with the program! There are two fixes: