PoonLab / vindels

Developing an empirical model of sequence insertion and deletion in virus genomes
1 stars 0 forks source link

Missing data set #103

Open ArtPoon opened 1 year ago

ArtPoon commented 1 year ago

Looks like 101034 failed to converge. Not surprising because it contains over 1,000 sequences! There is a strong clock signal though. I have generated three random sub-samples of 500 sequences each, which is probably at the upper limit of what will converge in a reasonable time. I generated BEAST XML files and ran them for 2e8 steps. The chains are barely converged, largely because of treeLikelihood (again not surprising because the trees are very large and startingTrees are random coalescents).

Consequently, I manually edited one of the XMLs to lengthen the chain and use UPGMA to generate startingTree:

[jpalmer@BEVi Desktop]$ diff 101034_0-b.constant.xml 101034_0-a.constant.xml 
517c517
<   
---
> <!--  
521a522,529
> -->
>         <upgmaTree id="startingTree" rootHeight="25">
>             <distanceMatrix correction="JC">
>                 <patterns>
>                     <alignment idref="alignment"/>
>                 </patterns>
>             </distanceMatrix>
>         </upgmaTree>    
676c684
<   <mcmc id="mcmc" chainLength="200000000" autoOptimize="true" operatorAnalysis="101034_0-b_const.ops">
---
>   <mcmc id="mcmc" chainLength="500000000" autoOptimize="true" operatorAnalysis="101034_0-a_const500.ops">
731c739
<       <log id="fileLog" logEvery="20000" fileName="101034_0-b_const.log" overwrite="true">
---
>       <log id="fileLog" logEvery="50000" fileName="101034_0-a_const500.log" overwrite="true">
752c760
<       <logTree id="treeFileLog" logEvery="200000" nexusFormat="true" fileName="101034_0-b_const.time.trees" sortTranslationTable="true">
---
>       <logTree id="treeFileLog" logEvery="500000" nexusFormat="true" fileName="101034_0-a_const500.time.trees" sortTranslationTable="true">
765c773
< </beast>
\ No newline at end of file
---
> </beast>