csmiguel / usat_snp

Phylogeographic inference with microsatellites vs SNPs: a case of two frogs from the Iberian Peninsula
1 stars 0 forks source link

Tree branch length normalization #2

Closed csmiguel closed 5 years ago

csmiguel commented 5 years ago

The metrics Dnp and Dnr are done over standarized tree distances. However, in the plots "DnrDnp_ggplot_bw.pdf" it is possible to see that metric for both markers do not have the same highest value. This could be caused by the fact that the lengths of the branches supporting the trees are different between usats and SNPs. For that reason, in the meeting from Sept 13th 2019, it was suggested to normalize the the distances using the maximum distance from any internal node (non-tip nodes) instead of normalizing as I have it now: maxd <- ape::cophenetic.phylo(tr1) %>% max() tr1$edge.length <- tr1$edge.length / maxd

csmiguel commented 5 years ago

Solved: ape::cophenetic.phylo just computes distances between tips. I used ape::dist.nodes instead, which computes all pairwise distances including patristic distances between internal nodes (as opposed to tips). It returns a matrix from which I select distances in the bottom-right quadrant, corresponding to distances between nodes only:

#Normalize length of the tree
  #distance between tips only
  disttips <- ape::cophenetic.phylo(tr1) %>% sort()
  #distance between all tips and nodes, but selecting only the quandrant of the
  # matrix with distances between tips
  distnodes <- ape::dist.nodes(tr1)[1:Ntip(tr1), 1:Ntip(tr1)] %>% sort()
  #asssert that the order of the columns is tips + nodes
  assertthat::assert_that(all(disttips == distnodes),
              msg = paste0("check order of tips and nodes in trees"))
  #meaning that the bottom-right quadrant of the matrix corresponds to
  #distances between internal nodes
  maxd <- ape::dist.nodes(tr1)[(ape::Ntip(tr1) + 1):tr1$Nnode,
                               (ape::Ntip(tr1) + 1):tr1$Nnode] %>% max()
  # normalize
  tr1$edge.length <- tr1$edge.length / maxd
csmiguel commented 5 years ago

Fix on matrix selection for getting longest distance:

maxd <- ape::dist.nodes(tr1)[(ape::Ntip(tr1) + 1):(tr1$Nnode + ape::Ntip(tr1)),
                             (ape::Ntip(tr1) + 1):(tr1$Nnode + ape::Ntip(tr1))] %>% max()