KlausVigo / phangorn

Phylogenetic analysis in R
http://klausvigo.github.io/phangorn/
202 stars 38 forks source link

attributes switched when producing trees using random.addition() #104

Closed jmonroynieto closed 3 years ago

jmonroynieto commented 3 years ago

I have an asynchronous function that outputs a multiphylo list of trees. When extracting the pscore from the attributes using sapply(treeList, attributes)["pscore",] some trees have the order parameter switched with score. I believe the bug to be related to rtree() since I am running pratchet() as follows :

pratchet(data, rearrangements= "SPR", perturb    ation = "ratchet", start = rtree(length(data),rooted = TRUE, tip.label = names(data)), k=ratchet_k)

When running random.addition() as a starting tree, this does not happen.

     1           2           3           4           5           6
names  Character,3 Character,3 Character,3 Character,3 Character,3 Character,3
class  "phylo"     "phylo"     "phylo"     "phylo"     "phylo"     "phylo"
order  "postorder" "postorder" 85          "postorder" "postorder" 85
pscore 85          85          "cladewise" 85          85          "cladewise"
       7           8           9           10          11          12
names  Character,3 Character,3 Character,3 Character,3 Character,3 Character,3
class  "phylo"     "phylo"     "phylo"     "phylo"     "phylo"     "phylo"
order  85          "postorder" 85          "postorder" 85          85
pscore "cladewise" 85          "cladewise" 85          "cladewise" "cladewise"
       13          14          15          16          17          18
names  Character,3 Character,3 Character,3 Character,3 Character,3 Character,3
class  "phylo"     "phylo"     "phylo"     "phylo"     "phylo"     "phylo"
order  85          "postorder" 85          "postorder" "postorder" 85
pscore "cladewise" 85          "cladewise" 85          85          "cladewise"
       19          20          21          22          23          24
names  Character,3 Character,3 Character,3 Character,3 Character,3 Character,3
class  "phylo"     "phylo"     "phylo"     "phylo"     "phylo"     "phylo"
order  85          "postorder" 85          "postorder" 85          85
pscore "cladewise" 85          "cladewise" 85          "cladewise" "cladewise"
       25          26          27          28          29          30
names  Character,3 Character,3 Character,3 Character,3 Character,3 Character,3
class  "phylo"     "phylo"     "phylo"     "phylo"     "phylo"     "phylo"
order  "postorder" "postorder" 85          "postorder" "postorder" 85
pscore 85          85          "cladewise" 85          85          "cladewise"

The following workaround does not work, this leads me to think that the funky behaviour come from some interactions with sapply()

x <- pratchet(data, rearrangements= "SPR", perturbation = "ratchet", start = rtree(length(data),rooted = TRUE, tip.label = names(data)), k=ratchet_k)
realscore <- attributes(x)$order
attributes(x)$order <- attributes(x)$pscore
attributes(x)$pscore <- realscore
return(x)

EDIT: a previous version of this issue indicated that random.addition() was the culprit. Developing the workaround, the real crook was revealed.

jmonroynieto commented 3 years ago

it seems that not all phylo objects are the same, running str on my output:

Class "multiPhylo"
List of 3
 $ 1:List of 3
  ..$ edge     : int [1:23, 1:2] 17 18 24 24 24 18 19 19 17 17 ...
  ..$ Nnode    : int 8
  ..$ tip.label: chr [1:16] "sequence6" "sequence15" "sequence16" "sequence13" ...
  ..- attr(*, "class")= chr "phylo"
  ..- attr(*, "pscore")= num 85
  ..- attr(*, "order")= chr "cladewise"
 $ 2:List of 3
  ..$ edge     : int [1:29, 1:2] 18 18 28 28 20 20 19 19 22 22 ...
  ..$ tip.label: chr [1:16] "sequence1" "sequence7" "sequence8" "sequence9" ...
  ..$ Nnode    : num 14
  ..- attr(*, "class")= chr "phylo"
  ..- attr(*, "order")= chr "postorder"
  ..- attr(*, "pscore")= num 85
 $ 3:List of 3
  ..$ edge     : int [1:23, 1:2] 17 18 18 20 19 19 20 24 24 24 ...
  ..$ Nnode    : int 8
  ..$ tip.label: chr [1:16] "sequence14" "sequence16" "sequence4" "sequence6" ...
  ..- attr(*, "class")= chr "phylo"
  ..- attr(*, "pscore")= num 85
  ..- attr(*, "order")= chr "cladewise"

The only difference is the order of the attributes, which reliably comes out switched.

The solution I found is a little ugly but is works: sapply(lapply(treeList, attributes), function(x) x [["pscore"]]

KlausVigo commented 3 years ago

Hi @jmonroynieto ,

I just rewrote a lot of the parsimony code (Fitch algorithm), so you want to try the version on github. Please keep me updated if you discover any problems. It should be faster and use less memory, but sometimes results can differ slightly. I will try to upload it the next days on CRAN.

Generally with S3 classes, I would not bet that the order of attributes. So extract them by name, either with $ or attr. There are quite a few optional arguments to phylo obejects, like edge.length, node.label, order, pscore. You can extract (if exists) the pscore with attr(x, "pscore")

This line is a bit less "ugly" and more readable:

sapply(treeList, function(x) attr(x, "pscore"))

Cheers, Klaus