YuLab-SMU / treeio

:seedling: Base Classes and Functions for Phylogenetic Tree Input and Output
https://yulab-smu.top/treedata-book/
96 stars 26 forks source link

update child function #75

Closed xiangpin closed 2 years ago

xiangpin commented 2 years ago

Description

The original child function only can search the children nodes. But we want to extract all the descendant nodes, the tip nodes or only the internal nodes sometimes. So this request update it to meet these needs.

Related Issue

Example

> library(treeio)
treeio v1.19.1.900 For help:
https://yulab-smu.top/treedata-book/

If you use the ggtree package suite in published research, please cite
the appropriate paper(s):

LG Wang, TTY Lam, S Xu, Z Dai, L Zhou, T Feng, P Guo, CW Dunn, BR
Jones, T Bradley, H Zhu, Y Guan, Y Jiang, G Yu. treeio: an R package
for phylogenetic tree input and output with richly annotated and
associated data. Molecular Biology and Evolution. 2020, 37(2):599-603.
doi: 10.1093/molbev/msz240

G Yu. Data Integration, Manipulation and Visualization of Phylogenetic
Trees (1st ed.). Chapman and Hall/CRC. 2022. ISBN: 9781032233574

Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam.
ggtree: an R package for visualization and annotation of phylogenetic
trees with their covariates and other associated data. Methods in
Ecology and Evolution. 2017, 8(1):28-36. doi:10.1111/2041-210X.12628

> set.seed(123)
> tr <- rtree(10)
> tr %>% child(.node=14, type='all')
 [1] 15 17 16  6  4  5 18 19  7  8  9 10
> tr %>% child(.node=14, type='tips')
[1]  6  4  5  7  8  9 10
> tr %>% child(.node=14, type='internal')
[1] 15 17 16 18 19
> tr %>% child(.node=14)
[1] 15 17
> library(MicrobiotaProcess)
MicrobiotaProcess v1.7.8.990 For help:
https://github.com/YuLab-SMU/MicrobiotaProcess/issues

If you use MicrobiotaProcess in published research, please cite the
paper:

S Xu, L Zhan, W Tang, Z Dai, L Zhou, T Feng, M Chen, S Liu, X Fu, T Wu,
E Hu, G Yu. MicrobiotaProcess: A comprehensive R package for managing
and analyzing microbiome and other ecological data within the tidy
framework. 04 February 2022, PREPRINT (Version 1) available at Research
Square [https://doi.org/10.21203/rs.3.rs-1284357/v1]

This message can be suppressed by:
suppressPackageStartupMessages(library(MicrobiotaProcess))
> data(mouse.time.mpse)
> mouse.time.mpse %>% mp_extract_taxatree() -> taxa.tree
> taxa.tree %>% filter(nodeClass=="Phylum", keep.td=FALSE) %>% pull(label) -> xx
> taxa.tree %>% child(.node=xx, type='all')
$p__Actinobacteria
 [1] 230 231 243 259 286 343   1 244 260 261 287 344   2 288 345 346   3   4   5

$p__Bacteroidetes
 [1] 232 245 262 263 264 289 347   6 290 348   7   8   9  10  11  12  13  14  15
[20]  16  17  18  19  20  21 291 349  22

$p__Cyanobacteria
[1] 233 246 265 292 350  23  24  25

$`p__Deinococcus-Thermus`
[1] 234 247 266 293 351  26

$p__Firmicutes
  [1] 235 236 237 248 267 268 294 352  27  28  29 295 353 354  30  31  32 249
 [19] 269 270 271 272 273 274 275 296 355  33 297 298 356  34 357  35  36 299
 [37] 358  37  38  39  40  41  42  43  44 300 301 359  45 360  46 302 303 304
 [55] 305 306 307 308 309 310 311 312 313 314 315 316 317 318 361  47  48  49
 [73]  50 362 363  51  52  53  54  55  56  57  58  59  60 364  61  62 365  63
 [91]  64 366  65  66  67  68  69 367  70 368  71  72  73  74  75  76  77  78
[109]  79 369  80  81  82 370 371  83  84  85  86  87  88  89  90  91  92  93
[127]  94  95  96  97  98  99 100 101 102 103 104 105 372 106 373 107 108 109
[145] 110 374 111 375 376 112 113 114 377 115 116 117 118 119 120 121 122 378
[163] 123 379 124 380 125 126 127 128 129 130 131 132 133 134 135 136 137 138
[181] 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156
[199] 157 158 159 160 161 162 163 164 165 166 167 319 381 168 169 320 321 322
[217] 323 324 325 326 327 328 329 330 331 382 170 171 383 172 384 173 385 174
[235] 386 175 176 177 387 178 179 180 388 181 182 183 184 389 185 186 390 187
[253] 188 189 190 391 191 392 192 193 194 195 196 197 393 198 199 200 201 202
[271] 250 276 332 333 394 203 395 204

$p__Patescibacteria
[1] 238 251 277 334 396 205 206

$p__Proteobacteria
 [1] 239 240 252 278 335 397 207 253 254 255 279 336 398 208 280 337 399 209 281
[20] 282 338 400 210 339 401 211

$p__Tenericutes
 [1] 241 256 257 283 340 402 212 284 341 403 213 214 215 216 217

$p__Verrucomicrobia
[1] 242 258 285 342 404 218

> taxa.tree %>% child(.node=xx, type='all') %>% purrr::map(list) %>% do.call(rbind, .) %>% tibble::as_tibble(rownames='Phyla')
# A tibble: 9 × 2
  Phyla                  V1
  <chr>                  <list>
1 p__Actinobacteria      <int [19]>
2 p__Bacteroidetes       <int [28]>
3 p__Cyanobacteria       <int [8]>
4 p__Deinococcus-Thermus <int [6]>
5 p__Firmicutes          <int [278]>
6 p__Patescibacteria     <int [7]>
7 p__Proteobacteria      <int [26]>
8 p__Tenericutes         <int [15]>
9 p__Verrucomicrobia     <int [6]>
GuangchuangYu commented 2 years ago

what's the difference between the new feature and the offspring() ?

xiangpin commented 2 years ago

Yes, it is similar to offspring. But offspring has some arguments and it was depended on many functions of ggtree. And the offspring and old child functions did not support vector .node (length >1), offspring also did not support the internal nodes (without tips). The request use the recursion method, I do not test the performance between them.

GuangchuangYu commented 2 years ago

no ggtree dependency, see https://github.com/YuLab-SMU/treeio/blob/master/R/offspring.R#L55-L85.

I think it would be better to move the functionalities of extracting offspring nodes (internal, external or all) to the offspring function.

The child() can be a superset of the offspring, and this can only be in this way:

child <- function(..., type = 'child') {
    if (type = 'child') {
        # do something, the actual child function.
  } else {
    offspring(..., type=type)
  }
}

It is very important for not confusing users.