Open simone66b opened 2 years ago
Hi, I can try to answer some of the smaller questions, though I haven't gone in on your long code chunk.
First, how do I find the root node?
julia> using Phylo
julia> tree = parsenewick("((a:1.0, (b:1.0, c:1.0):1.0):1.0, d:1.0);")
RootedTree with 4 tips, 7 nodes and 6 branches.
Leaf names are c, b, a and d
julia> getroot(tree)
LinkNode Node 7, a root node with 2 outbound connections (branches 5, 6)
How do I get an iterator over the branches?
Do you mean the nodes?
julia> iter = nodeiter(tree)
Phylo.NodeIterator{RootedTree}(RootedTree with 4 tips, 7 nodes and 6 branches.
Leaf names are c, b, a and d
, nothing)
julia> first(iter)
LinkNode a, a tip of the tree with an incoming connection (branch 4).
Or if you do want to collect them into a vector (you usually shouldn't have to)
julia> vec = collect(iter)
7-element Vector{LinkNode{OneRoot, String, Dict{String, Any}, LinkBranch{OneRoot, String, Dict{String, Any}, Float64}}}:
...
julia> vec[3]
LinkNode c, a tip of the tree with an incoming connection (branch 1).
julia> typeof(vec[3])
LinkNode{OneRoot, String, Dict{String, Any}, LinkBranch{OneRoot, String, Dict{String, Any}, Float64}}
julia> vec[3].name
"c"
Does this help a little?
That helps a bit, but getroot() doesn't work for me:
julia> using Phylo, Plots
julia> tree = parsenewick("((a:1.0, (b:1.0, c:1.0):1.0):1.0, d:1.0);")
NamedTree with 4 tips, 7 nodes and 6 branches.
Leaf names are a, b, c and d
julia> getroot(tree)
ERROR: UndefVarError: getroot not defined
Stacktrace:
[1] top-level scope
@ none:1
julia>
In fact, most of the advertised functions on [http://docs.ecojulia.org/Phylo.jl/stable/] don't work.
I'm using Julia Version 1.7.2 (2022-02-06)
(@v1.7) pkg> st Phylo
Status ~/.julia/environments/v1.7/Project.toml
[aea672f4] Phylo v0.3.3
Is my version too old? I just installed it using ] add Phylo.
Yes the current version is 0.4.21. Can you post the full status? Or try to install it into a clean environment?
(@v1.7) pkg> status
Status ~/.julia/environments/v1.7/Project.toml
[1520ce14] AbstractTrees v0.2.1
[3637df68] Bio v1.0.1
[8f4d0f93] Conda v1.7.0
[aae7a2af] DiffEqFlux v0.7.0
[0c46a032] DifferentialEquations v6.9.0
[31c24e10] Distributions v0.23.8
[d3d5718d] Diversity v0.4.6
[b2ad6718] EmacsVterm v0.2.0
[cd3eb016] HTTP v0.8.19
[916415d5] Images v0.24.1
[682c06a0] JSON v0.21.3
[b964fa9f] LaTeXStrings v1.3.0
[442fdcdd] Measures v0.3.1
[e1d29d7a] Missings v0.4.5
[f57f5aa1] PNGFiles v0.3.14
[65888b18] ParameterizedFunctions v5.6.0
[aea672f4] Phylo v0.3.3
[c0d5b6db] PhyloPlots v0.3.1
[4c47b132] PhyloTrees v0.11.1
[a03496cd] PlotlyBase v0.3.1
[91a5bcdd] Plots v0.28.4
[d330b81b] PyPlot v2.10.0
[74087812] Random123 v1.4.2
[f2b01f46] Roots v0.8.4
[90137ffa] StaticArrays v0.12.5
[2913bbd2] StatsBase v0.32.2
[f3b207a7] StatsPlots v0.14.5
[789caeaf] StochasticDiffEq v6.16.1
OK I managed to create a new project and generate the environment, add Phylo and add DifferentialEquations. I now have the latest version of Phylo:
(testJulia) pkg> st Phylo
Project testJulia v0.1.0
Status ~/testJulia/Project.toml
[aea672f4] Phylo v0.4.21
So far, so good. How do I start a julia process so that it knows about the new project?
This is really useful reading for working with environments https://pkgdocs.julialang.org/v1/environments/
In brief, if you use VSCode it will generally use the environment associated with your work folder - so if you create an environment in the work folder using ]activate .
, opening that folder from VScode should automatically activate that environment in a julia session.
This works essentially like a souped-up version of RStudio's "projects" feature.
If you are working with scripts you can specify it like julia --project=testJulia myscript.jl
I guess just an added comment is that the reason you got this version issue is that you have some old package in your default environment that is keeping Phylo back because of dependency issues. For this reason, it is generally recommended to have an empty, or very lean, default environment, and only installing packages into project environments. This greatly reduces the risk of dependency issues.
I'm not myself completely disciplined, but I try - this is my default environment
(@v1.7) pkg> st
Status `~/.julia/environments/v1.7/Project.toml`
[6e4b80f9] BenchmarkTools v1.2.2
[336ed68f] CSV v0.9.11
[a93c6f00] DataFrames v1.1.1
[5fb14364] OhMyREPL v0.5.10
[91a5bcdd] Plots v1.25.7
[c3e4b0f8] Pluto v0.17.7
[295af30f] Revise v3.3.1
[f3b207a7] StatsPlots v0.14.30
But maybe - let us keep using this issue for your question on trait modelling and tree traversal and take any discussion of environments etc under the #ecology channel on https://julialang.zulipchat.com/register/ ?
Fair enough. I'll read up about environments. Thanks!
Great. In terms of your core question, could you specify what you mean by
recursively traverse the tree so I can simulate stochastic differential equations on the tree`
? I'm not quite sure I get it.
Simply that the only? way to simulate trait evolution on a tree is to start at the root and simulate along the branch to the next node. Then use the final value of the trait at that node as the starting value for the simulation along the next branch downstream. To step from the root along the tree it makes sense (to me) to use recursion because you can apply the same trait evolution function (in my example above, the Ornstein-Uhlenbeck process) to each subtree with the new "root" as the root of the subtree and using the starting value for the trait the end value along the branch leading to the new "root." The base case is when you reach a leaf in which case you simply simulate along the branch leading to the leaf and then return from the recursive function. This is the approach that I use in the Recurse! function, above. It works fine but I'm not sure about its use of memory resources or speed. It's certainly much faster than using the same approach in R. If there is another better approach to traversing the tree and simulating trait evolution along the way, I would be glad to know about it. My challenge now is to use the Phylo Tree/Branch/Node data structures (and DifferentialEquations.jl) as a basis for my code above, rather than my simpleminded Node type. Then I will have code that will allow for the evolution of almost any continuous stochastic process along any tree, not just Brownian Motion as in BrownianTrait (I think. I haven't used BrownianTrait. Is there a toy working example for BrownianTrait?)
I'm not quite sure what you need here, but it seems like you may be reconstructing code that already exists. There is a traversal()
function that can give you depth-first or breadth-first access to nodes (apologies, TraversalOrder
itself is not documented, but its definition is here). You can also simulate traits on the tree directly - the docs are here.
Maybe if you can explain what they don't do that you need, we can work out where to go from there?
@mschauer used Phylo to model OU processes a few years back
Yeah, I have some code for that somewhere, but essentially you need to do tree traversal and call StochDiffEq on each segment. Do you want to forward simulate or also do inference on the parameters given data?
Ideally both simulation and inference. I have code that works now. Essentially I do a recursive preorder tree traversal and use the previous branch's last simulated value as the starting value for the current branch. (I store the simulations in the branch rather than the node.) Not thought heavily about inference yet but my previous work in R suggests that it is difficult if all you have are the data at the leaves. There's just not much information about the evolutionary dynamics.
Please forgive my ignorance. I am a longtime R programmer/user but am new to Julia. What I really want to do is to recursively traverse the tree so I can simulate stochastic differential equations on the tree, ideally using DifferentialEquations.jl. I am trying to get my head around Phylo.jl as I don't want to re-invent the wheel and anyway Phylo.jl looks more featureful than what I could ever build on my own. Consider the following toy code:
First, how do I find the root node? getroots() doesn't work. Second, how can I work with an iterator over the branches? I tried this:
6-element Vector{Pair{Int64, Branch{String}}}: [node "Node 3"]-->[1.0 length branch 5]-->[node "Node 2"]
[node "Node 2"]-->[1.0 length branch 4]-->[node "a"]
[node "Node 3"]-->[1.0 length branch 6]-->[node "d"]
[node "Node 1"]-->[1.0 length branch 2]-->[node "b"]
[node "Node 2"]-->[1.0 length branch 3]-->[node "Node 1"]
[node "Node 1"]-->[1.0 length branch 1]-->[node "c"]
I am unsure how to work with this any further. Does the --> symbol mean anything programmatically? Or is it literally just an arrow pointing from node to node/leaf? How do I get access to the information in the elements of the Vector? To put this into more context, here is a toy version of my code using DifferentialEquations.jl which uses my own Node type, which does what I want to do. It doesn't scale up very well because I haven't written a Newick tree parser. I just built the Nodes by hand. (The main reason to use Phylo.jl):
I'm sorry for the quality of the code, I just translated the way I envisaged it in my head and haven't thought about different ways to do it that might be cleaner and more efficient. Also, I notice the Tutorial/Quick Start document is missing from the website. I would be happy to work on that if you like! I am obviously still learning but I should get into the position where I am able to write such a document.