YuLab-SMU / treeio

:seedling: Base Classes and Functions for Phylogenetic Tree Input and Output
https://yulab-smu.top/treedata-book/
94 stars 24 forks source link

read.nextstrain.json fails on divergence trees #126

Open CecileTK opened 2 months ago

CecileTK commented 2 months ago

Description of the issue

I've recently faced issues when using the function read.nextstrain.json on divergence trees (with divergence units in number of mutations). When using it on a json file (zika-divergence.json) obtained from a modified version of the Zika Nextstrain workflow,, I obtain the following error:

Session Info ```r library(treeio) read.nextstrain.json("zika-divergence.json") Error in `dplyr::bind_rows()`: ! Can't combine `..1$div` and `..2$div` . Run `rlang::last_trace()` to see where the error occurred. ```

Proposed solution

Playing with the source code, it looks like the error comes from the following line of code, with some nodes being loaded with a div variable as an integer and some as a character:

https://github.com/YuLab-SMU/treeio/blob/95046174ef1f4508f9e4417d2b203e4078fd05c6/R/nextstrain.json.R#L53

Locally, I've managed to solve this by replacing:

https://github.com/YuLab-SMU/treeio/blob/95046174ef1f4508f9e4417d2b203e4078fd05c6/R/nextstrain.json.R#L39-L43

by:

if ('div' %in% colnames(id[['data']][[id[['id']]]])){ 
     parent.index <- id[['data']][[id[['id']]]][['parentID']] 

     # Systematically converting div to a numeric value
     id[["data"]][[id[["id"]]]][["div"]] <- as.numeric(id[["data"]][[id[["id"]]]][["div"]])

     id[['data']][[id[['id']]]][['branch.length']] <- id[['data']][[id[['id']]]][['div']] -  
         as.numeric(id[['data']][[parent.index]][['div']]) 
 } 

I'd be happy to submit a pull request regarding this. Would you have a preference regarding how to do this and whether it might be valuable to also add the json file (zika-divergence.json) ? I also noticed that read.nextstrain.json does not have a testing file. Would it make sense to include all of this as part of a testing file? If so, do you have suggestions on what the testing should check?

Thank you!