CompEvol / beast2

Bayesian Evolutionary Analysis by Sampling Trees
www.beast2.org
GNU Lesser General Public License v2.1
240 stars 84 forks source link

Feature request: partial monophyly constraints #1153

Open seraklop opened 5 months ago

seraklop commented 5 months ago

Would it be possible to add partial monophyly constraints, that is, monophyly enforced only for a subset of taxa, while the rest are free to go anywhere in the tree? This can for instance be done in MrBayes with "constraint myconstraint partial = taxonlist1 : taxonlist2".

This would help considerably with datasets that include fossils, which often provide too limited information to be sufficiently certain about their position for a hard monophyly constraint. An example here are rooting constraints that only apply to extant taxa, while the fossils are left free to go anywhere.

Another example are of course heterogeneous datasets, where genomic data is available only for a few backbone taxa, while the information for the rest is too limited to enforce monophyly.

rbouckaert commented 5 months ago

Perhaps the MRCAPriorWithRogues is what you are looking for. It is in the BEASTLabs package. To use it requires a bit of XML editing. Probably easiest is to set up an MRCA prior in BEAUti and specify the taxa in taxonlist1. Save the file, and open the XML in a text editor. Go to the MRCA prior, which looks something like this:

                <distribution id="taxonlist1.prior" spec="beast.base.evolution.tree.MRCAPrior" tree="@Tree.t:dna" monophyletic="true">
                    <taxonset id="taxonlist1" spec="TaxonSet">
                        <taxon id="Carp" spec="Taxon"/>
                        <taxon id="Chicken" spec="Taxon"/>
                        <taxon id="Cow" spec="Taxon"/>
                    </taxonset>
                </distribution>

Replace the spec attribute with beastlabs.math.distributions.MRCAPriorWithRogues and add the rogues in taxonlist2, so it looks like this:

                <distribution id="taxonlist1.prior" spec="beastlabs.math.distributions.MRCAPriorWithRogues" tree="@Tree.t:dna" monophyletic="true">
                    <taxonset id="taxonlist1" spec="TaxonSet">
                        <taxon id="Carp" spec="Taxon"/>
                        <taxon id="Chicken" spec="Taxon"/>
                        <taxon id="Cow" spec="Taxon"/>
                    </taxonset>
                    <rogues id="taxonlist2" spec="TaxonSet">
                        <taxon id="Dog" spec="Taxon"/>
                        <taxon id="Dolphin" spec="Taxon"/>
                        <taxon id="Duck" spec="Taxon"/>
                    </rogues>
                </distribution>

You need to have to the BEASTLabs package installed to run the XML. Hope this is what you had in mind.

seraklop commented 5 months ago

Dear Remco,

great, thanks so much! Of course that is exactly what we need. However, now initialization apparently does not work anymore with the NJ starting tree. Is it possible that this function only works with hard constraints and not with such partial constraints?

Thanks a lot for your help, Seraina

rbouckaert commented 5 months ago

Hi Seraina,

The ClusterTree indeed does not pick up such constraints, but the ConstrainedClusterTree. Can you try replacing the XML element with spec="ClusterTree" to spec="beastlabs.evolution.tree.ConstrainedClusterTree" and see whether that starts.

Cheers, Remco

AlexaViert commented 5 months ago

Dear Remco

Thank you already for your help. Seraina and I tried with the ConstrainedClusterTree, but now the analysis will not initialize because it detects negative branch lengths. Do you know what the issue could be? We tried already to use different seeds.

Best, Alexandra

rbouckaert commented 5 months ago

Hi Alexandra,

The ConstrainedClusterTree has a minimum branch length option (defaults to minBranchLength="1e-10") that should guarantee some minimum branch length. Obviously, this is not working for your analysis. If you can send me the XML I can have a look at what is causing the problem.

Have you tried using a RandomTree for initialisation? This is recommended in general in order to not bias the MCMC sample by a fixed starting point.

Cheers, Remco

seraklop commented 5 months ago

Dear Remco, thanks so much for all your help - Alexandra is in the last weeks of her PhD, and your input already greatly helped us speed up the process! We are now using both a random and a user tree as starting trees and compare the outcome. So currently, we have a work-around for the issue with the ConstrainedClusterTree (it is a dataset with a large number of highly incomplete fossils and thus a lot of uncertainty in the topology, which is making convergence very slow - hence our need to use a good starting tree as well). But if it is interesting for you to get our input files in any case to repeat the issue, let us know. Otherwise, from our side, this could be closed. Thanks again! Seraina