ContentMine / phylotree

A repository for ami-phylotree development
0 stars 0 forks source link

Conflict in Newick validity between R & p4/STK2 #22

Open rossmounce opened 9 years ago

rossmounce commented 9 years ago

R and p4 / STK2 seem to conflict over whether trees are valid or not :disappointed:

Newick file: ijs.0.65514-0-000.pbm.nwk

((((D0062743:155.0,(M62795:103.0,(AFO78775:118.0,ABO78049:61.0)NT1.12:48.0)NT1.9:25.0)NT1.7:33.0,ABO78055:121.0)NT1.5:15.0,(A3278570:99.0,(AB264798:31.0,D12657:57.0)NT1.10:53.0)NT1.8:49.0)NT1.3:86.0,((EF407879:242.0,(AB192292:146.0,(M62798:135.0,DQ457019:206.0)NT1.6:33.0)NT1.4:17.0)NT1.2:36.0,(DQ244076:27.0,00244077:29.0)NT1.11:157.0)NT1.1:33.0)NT1.27;

fine for R can read it in and plot it. but p4 / STK2 gives warning about unmatched parenthesis:

***Error: failed to parse a tree in your data set.
Error parsing tree

Tree.parseNewick(), tree 't0'
    Unmatched unparen.
((((D0062743:155.0,(M62795:103.0,(AFO78775:118.0,ABO78049:61.0)NT1.12:48.0)NT1.9:25.0)NT1.7:33.0,ABO78055:121.0)NT1.5:15.0,(A327
rossmounce commented 9 years ago

I'm going to start looking into MRP matrix creation using R now instead of STK2 for this reason. Specifically, using the phytools package http://www.inside-r.org/packages/cran/phytools/docs/mrp.supertree

rossmounce commented 9 years ago

Easy to script but I'm concerned R will be too slow to calculate the supertree. Would like to see just the matrix produced but phytools doesn't seem to offer this step. Too clever for its own good!

require(phytools)
trees274<-read.newick(file="asciitreesv2.tre")
summary(trees274)
mrp.supertree(trees274, method=c("pratchet","optim.parsimony"))
# ...wait a long time. TNT would do this a lot quicker than R
rossmounce commented 9 years ago

Even better! https://github.com/smirarab/mrpmatrix A very simple program that creates the MRP matrix I need

petermr commented 9 years ago

On Tue, Aug 11, 2015 at 5:43 PM, Ross Mounce notifications@github.com wrote:

I'm going to start looking into MRP matrix creation using R now instead of STK2 for this reason

I think that's a good pragmatic decision. By using NEXML as the primary output of ami-phylo we can switch to different tools. This problem of undefined "de facto" semi-standards (like Newick) holds science back.

Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069