EcoJulia / Phylo.jl

Simple phylogenetic trees in Julia to work with Diversity.jl - https://github.com/EcoJulia/Diversity.jl
BSD 2-Clause "Simplified" License
34 stars 13 forks source link

editing leaf names #54

Closed shaul-pollak closed 10 months ago

shaul-pollak commented 3 years ago

Hi! Thanks for this awesome package! Is it possible to change the leaf names of the tree after it was created? I couldn't find this functionality currently Thanks!

richardreeve commented 3 years ago

Hmm. No, not at the moment. Can I ask why you want to do that?

shaul-pollak commented 3 years ago

of course! for instance, when im reading a tree that is the output of some program, it may have a very complicated name like GCF_somethingsomething.1_ASMsomethingsomehing this is hard for my puny human mind. i want to change it to just GCF_something. This happens quite a lot when working with external databases like NCBI or JGI Thanks for the quick answer!

richardreeve commented 3 years ago

Fair enough. I'll have a think about whether there's an easy way of doing it...

shaul-pollak commented 3 years ago

thank you!!

richardreeve commented 10 months ago

Sorry it has taken so long to get around to this, @shaul-pollak. I think I have a workaround on the off chance you're still interested. As of v0.5.1, we can export trees and so we can export and reimport through a newick tree to solve your problem, because for various nexus-format-related reasons the exported tree can have different node names than the tree in memory.

So:

julia> using Random, Phylo

julia> tree = rand(Nonultrametric(10));

julia> nn = getnodenames(tree);

julia> d = Dict(nn .=> nn);

julia> d["tip 1"] = "first tip"
"first tip"

julia> new_tree = parsenewick(Phylo.outputtree(t, Newick(d)));

julia> getleafnames(new_tree)
10-element Vector{String}:
 "tip 8"
 "tip 3"
 "tip 5"
 "tip 7"
 "tip 9"
 "first tip"
 "tip 6"
 "tip 4"
 "tip 2"
 "tip 10"

You can also skip some nodes and they'll be given default names as they are read in:

julia> ln = getleafnames(tree);

julia> d2 = Dict(ln .=> string.(Ref("newer "), ln))
Dict{String, String} with 10 entries:
  "tip 7"  => "newer tip 7"
  "tip 4"  => "newer tip 4"
  "tip 8"  => "newer tip 8"
  "tip 1"  => "newer tip 1"
  "tip 9"  => "newer tip 9"
  "tip 2"  => "newer tip 2"
  "tip 10" => "newer tip 10"
  "tip 6"  => "newer tip 6"
  "tip 5"  => "newer tip 5"
  "tip 3"  => "newer tip 3"

julia> newer_tree = parsenewick(Phylo.outputtree(t, Newick(d2)));

julia> getnodenames(newer_tree)
19-element Vector{String}:
 "Node 19"
 "Node 18"
 "Node 16"
 "newer tip 4"
 "newer tip 2"
 "newer tip 10"
 "Node 13"
 "Node 12"
 "newer tip 1"
 "newer tip 6"
 "Node 9"
 "Node 8"
 "newer tip 9"
 "newer tip 7"
 "Node 5"
 "newer tip 8"
 "Node 4"
 "newer tip 3"
 "newer tip 5"

Anyway, that's about as good as it gets at the moment. I don't know if it's still any use to you, but I thought I'd mention it just in case.

richardreeve commented 10 months ago

Okay, renamenode!(tree, node, "new name") now works for some tree types in #91. It'll return true if it succeeds (it may fail even for supported tree types if the new name is a duplicate), and false if it fails or the tree type is not supported. The new RecursiveTree types are able to rename nodes (so long as there is no leaf information, or it's a Dict, which includes the default types like RootedTree).