Based on this suggestion by Remco,
we can correctly account for constant sites in a BEAST2 analysis by adding the
following data
tag below your current data
tag:
<data id="xyz" spec="FilteredAlignment" filter="-" data="@xyzOriginal" constantSiteWeights="100 200 300 400"/>
This assumes that your original <data>
tag had id=xyz
and was renamed
to id=xyzOriginal
, and that you have 1000 constant sites that were
removed from the alignment, with:
Other phylogenetic software allow you to also account for constant sites if
you are only supplying an alignment with variable sites. As of version 0.3.0
,
beast2_constsites
also supports IQTree as well as
BEAST2
.
If you would like support for other tools, please file a GitHub issue, and specify how the tool would require the information.
This script will take a FASTA file with a single DNA sequence (e.g., a
bacterial chromosome), a VCF file containing the position of
SNPs along the FASTA file (e.g., as outputted from snippy-core)
and the XML file produced by BEAUTi containing only variable sites. It will
output a new XML file named <original_xml_name>_plus_const.xml
with the
added information to account for constant sites. There is nothing else you
need to do but run BEAST2.
It will optionally also take a BED file with positions to mask (e.g., positions of phage).
beast2_constsites
will take the same FASTA, VCF, and optionally the BED file,
as described above, and will print out the iqtree
command's -fconst
option
with the appropriate numbers and format. You can then copy paste the relevant
portion to form the complete iqtree
command.
pip3 install b2constsites
If installed correctly, a scripted called run_b2cs
should be available in
your path:
run_b2cs --help
At a minimum, you need to supply a sequence file with your reference (assuming it has a single chromosome entry --- this was designed for bacterial genomics, but may work with viral too), a VCF file with variants, and the XML output from BEAUTi.
run_b2cs -x myxml.xml beast2 myref.fasta myvar.vcf
A new file called myxml_plus_const.xml
will be created in the same folder as
myxml.xml
.
At a minimum, you need to supply a sequence file with your reference (assuming it has a single chromosome entry --- this was designed for bacterial genomics, but may work with viral too), and a VCF file with variants.
run_b2cs iqtree myref.fasta myvar.vcf
This script will:
The output will be, therefore, an approximation. However, it should be a close enough approximation that it will provide a better inference from BEAST2 and IQTree than if one uses only variable sites, and then corrects in some post hoc manner.
Anders Gonçalves da Silva
Sarah Baines
Jean Lee
Torsten Seemann
Anders Gonçalves da Silva