emmanuelparadis / ape

analysis of phylogenetics and evolution
http://ape-package.ird.fr/
GNU General Public License v2.0
52 stars 11 forks source link

No tip label for one of the tips #95

Closed lpipes closed 1 year ago

lpipes commented 1 year ago

Hi, I printed out the tip labels (tree$tip.label) however one of the tips doesn't have a label: I used

write.table(file="tips.txt",sep="\t",quote=F,as.data.frame(tree$tip.label))

In tips.txt:

12144628        EPI_ISL_3025220
12144629        EPI_ISL_2959021
12144630        EPI_ISL_16960747
12144631
12144632        EPI_ISL_454951
12144633        EPI_ISL_450458
12144634        EPI_ISL_476780
emmanuelparadis commented 1 year ago

Hi, How did you get tree into R? It could have been badely processed by ape, or if it came from a Newick file, the latter may have been wrong.

lpipes commented 1 year ago

I just used read.tree() to get the tree into R. The Newick file was a file written by write.tree() after I had used multi2di and collapse.singles().

emmanuelparadis commented 1 year ago

Can you post the file here?

lpipes commented 1 year ago

Unfortunately, the file is 64MB gzipped and only 25MB is allowed. Is there another way I can send it?

lpipes commented 1 year ago

I uploaded it on Google drive: https://drive.google.com/file/d/1fxQ12_NTXu-gLH60k5TpZoOc_1W7Ig0y/view?usp=sharing

emmanuelparadis commented 1 year ago

It seems the Newick file has a problem: one tip label has been replaced by a 'newline' (\n): ...:0):0):0,(EPI_ISL_16960747:0,\n:0):0):0):0)...

Here's the R code to help diagnose this:

fl <- "global.out.tree" # after unzipping

library(ape)
tree <- read.tree(fl)
nc <- nchar(tree$tip.label)
zero.tiplab <- nc == 0
sum(zero.tiplab) # how many zero-length labels?
i <- which(zero.tiplab)
w <- which(tree$edge[, 2] == i)
anc <- tree$edge[w, 1]
j <- which(tree$edge[, 1] == anc)
tree$edge[j, ]
tree$tip.label[12144630] # <- this is the sister of ""
tree$tip.label[12144631] # ""

x <- readBin(fl, raw(), file.size(fl))
xchar <- rawToChar(x)
sister <- tree$tip.label[12144630]
gregexpr(sister, xchar)
## use the last output to do:
substr(xchar, 272732648 - 10, 272732648 + 31)