arborworkflows / ArborWebApps

A bundle of Tangelo applications used by NSF Arbor (Phylogenetic Comparative Methods system)
Apache License 2.0
9 stars 1 forks source link

Arbor needs to improve support for Nexus files #88

Open curtislisle opened 9 years ago

curtislisle commented 9 years ago

NeXML is an emerging standard, and the preferred way to import trees to OpenTree. We should try to support it eventually. NeXML is supported by the Python ETE library. Maybe we can add ETE to the Python configuration, as it seems to have many nice features (haven't looked deeply yet). This could be used in a Romanesco formatting step to convert to/from Nexml and nested-json...

https://pythonhosted.org/ete2/tutorial/tutorial_nexml.html http://etetoolkit.org

it is also readable by an R library that is part of the OpenScience:

http://cran.r-project.org/web/packages/RNeXML/vignettes/tutorial.html

curtislisle commented 9 years ago

Bob and Francois suggest using his work to read Nexus files:

http://francoismichonneau.net/2014/12/rncl/

jeffbaumes commented 9 years ago

We would need to add Romanesco format converters, documentation, and a test.

jeffbaumes commented 9 years ago

@BetsyMcPhail could you focus on the package here if you haven't already, for reading Newick and Nexus files:

http://francoismichonneau.net/2014/12/rncl/

Thanks.

jeffbaumes commented 9 years ago

This appears to be a two-part problem

  1. Update Nexus tree converters to use rncl.
  2. Create an analysis that takes a Nexus tree as input and creates a tree AND table as output. This could be optionally used in Easy Mode apps to specify both tree and table with a single Nexus file.
jeffbaumes commented 9 years ago

If we are depending on new R libraries, they will also need to be installed as part of Ansible provisioning for TangeloHub in https://github.com/Kitware/tangelohub/blob/master/devops/ansible/playbook.yml#L208-L223

BetsyMcPhail commented 9 years ago

The following functions are available in rncl: make_phylo(file, simplify = FALSE, ...) <------ will be deprecated, don't use read_nexus_phylo(file, simplify = FALSE, ...) read_newick_phylo(file, simplify = FALSE, ...)

These functions read NEXUS or Newick files and return an [R] object of class phylo/multiPhylo.

Note there are no functions that go in the opposite direction (i.e. R --> NEXUS or Newick).

lukejharmon commented 9 years ago

Rudimentary functions for the opposite direction can be found in ape.

R-->Newick is write.tree() R-->Nexus is write.nexus()

The first of these is solid (although I bet there are areas where it fails). The second can only write the tree to nexus and I don't think it can handle character data or anything else.

jeffbaumes commented 9 years ago

So @lukejharmon would read_nexus/newick_phylo be drop-in replacements for the ape readers we currently use? Is the phylo/multiPhylo the same as the ape tree type or is there another conversion needed? Also, I understand Nexus can also have an embedded character matrix. How do we effectively extract that? For example, @bobthacker wants to be able to set a single Nexus file in an Easy Mode app instead of separate tree and matrix files.

curtislisle commented 9 years ago

I haven’t looked yet, but from Betsy’s comment, I bet the output of the Francious’s stuff is a composite tree/matrix S4 object in R. We would have to copy from the output object to an ape tree, I expect. I think it is still worth looking at this to see if it satisfies @bobthacker as a tree/matrix dual reader. rncl (Francois’s package) reads singletons OK, according to the documentation.

I suggest an experimental integration where we add rncl to the packages installed in R, then we try exercising it from analyses we build, before modifying romanesco level processing.

On Mar 6, 2015, at 8:54 AM, Jeffrey Baumes notifications@github.com wrote:

So @lukejharmon would read_nexus/newick_phylo be drop-in replacements for the ape readers we currently use? Is the phylo/multiPhylo the same as the ape tree type or is there another conversion needed? Also, I understand Nexus can also have an embedded character matrix. How do we effectively extract that? For example, @bobthacker wants to be able to set a single Nexus file in an Easy Mode app instead of separate tree and matrix files.

— Reply to this email directly or view it on GitHub.

curtislisle commented 9 years ago

I stand corrected. the phylo and multiPhylo objects that come out of rncl is a single ape tree or a list of ape trees (phylo is an S3 R object). So it doesn’t look like we will have to copy out of a different type of tree representation. This should be easy to incorporate. rncl only supports trees, not character matrixes.

The phylobase package (didn’t we install that already reads nexus files. I might just play with this a bit

On Mar 6, 2015, at 9:36 AM, clisle@kvis clisle@knowledgevis.com wrote:

I haven’t looked yet, but from Betsy’s comment, I bet the output of the Francious’s stuff is a composite tree/matrix S4 object in R. We would have to copy from the output object to an ape tree, I expect. I think it is still worth looking at this to see if it satisfies @bobthacker as a tree/matrix dual reader. rncl (Francois’s package) reads singletons OK, according to the documentation.

I suggest an experimental integration where we add rncl to the packages installed in R, then we try exercising it from analyses we build, before modifying romanesco level processing.

On Mar 6, 2015, at 8:54 AM, Jeffrey Baumes notifications@github.com wrote:

So @lukejharmon would read_nexus/newick_phylo be drop-in replacements for the ape readers we currently use? Is the phylo/multiPhylo the same as the ape tree type or is there another conversion needed? Also, I understand Nexus can also have an embedded character matrix. How do we effectively extract that? For example, @bobthacker wants to be able to set a single Nexus file in an Easy Mode app instead of separate tree and matrix files.

— Reply to this email directly or view it on GitHub.