arzwa / Beluga.jl

Statistical analysis of gene family evolution using phylogenetic birth-death processes and WGD inference using rjMCMC
GNU Lesser General Public License v2.1
9 stars 2 forks source link
bayesian-inference bioinformatics evolution gene-family phylogenetics reversible-jump-mcmc

Copyright (C) 2020 Arthur Zwaenepoel

VIB/UGent center for plant systems biology - Bioinformatics & evolutionary genomics group

Beluga

Dependencies

Beluga requires unregistered dependencies, to install Beluga, fire up a julia session, hit ] and add the following:

(v1.3) pkg> add https://github.com/arzwa/NewickTree.jl#beluga
(v1.3) pkg> add https://github.com/arzwa/AdaptiveMCMC.jl
(v1.3) pkg> add https://github.com/arzwa/Beluga.jl

Usage

Please refer to the documentation for more detailed usage examples.

using Beluga, CSV, DataFrames, Parameters

# get some data
tree = readline("example/9dicots/9dicots.nw")
df = CSV.read("example/9dicots/9dicots-f01-25.csv")

# construct model and profile
λ, μ, η = 1.0, 0.92, 0.66
model, profile = DLWGD(tree, df, λ, μ, η)  

# compute log-likelihood
l = logpdf!(model, profile)

# get model parameters as vector = [λ1, ..., μ1, ..., q1, ..., η])
v = asvector(model)

# construct new model based on the old one and a new parameter vector
x = rand(length(v))
model = model(x)

# compute log likelihood again
l = logpdf!(model, profile)

# change parameters at node 5
Beluga.update!(model[5], (λ=1.5, μ=1.2))

# change η parameter at root
Beluga.update!(model[1], :η, 0.91)

# recompute likelihood efficiently starting from node 5
l = logpdf!(model[5], profile)

# gradient
g = gradient(model, profile)

# add a WGD node above node 6 at a distance 0.01 with q=0.25
addwgd!(model, model[6], 0.01, 0.25)
Beluga.extend!(profile, 6);

# compute the log-likelihood, now for the model with the WGD
logpdf!(model, profile)

# simulate a random phylogenetic profile under the model
rand(model)

# simulate a data set of 100 profiles
rand(model, 100)

# independent rates prior (check & adapt default settings!)
prior = IRRevJumpPrior(Tl=treelength(model))
logpdf(prior, model)

# sample random model from prior
@unpack model, Σ = rand(prior, model)

# reversible jump chain
model, profile = DLWGD(tree, df, λ, μ, η)  
chain = RevJumpChain(data=profile, model=model, prior=prior)

# run chain (fixed dimension - no reversible jump)
init!(chain)
mcmc!(chain, 100, show=1)

# run chain (variable dimensions - with reversible jump)
rjmcmc!(chain, 100, show=1)

Notes:

sed -i 's/NANSAFE_MODE_ENABLED = false/NANSAFE_MODE_ENABLED = true/g' ~/.julia/packages/ForwardDiff/*/src/prelude.jl

Citation

Beluga.jl is developed by Arthur Zwaenepoel at the VIB/UGent center for plant systems biology (bioinformatics and evolutionary genomics group). A preprint on the reversible-jump sampler for WGD inference implemented in this library can be found at BioRXiv.

@article {zwaenepoel2020,
    author = {Zwaenepoel, Arthur and Van de Peer, Yves},
    title = {Model-based detection of whole-genome duplications in a phylogeny},
    elocation-id = {2020.01.24.917997},
    year = {2020},
    doi = {10.1101/2020.01.24.917997},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2020/01/25/2020.01.24.917997},
    eprint = {https://www.biorxiv.org/content/early/2020/01/25/2020.01.24.917997.full.pdf},
    journal = {bioRxiv}
}