Open petermr opened 9 years ago
If that is implying that all trees have to be binary then no, that is not correct.
It is permitted in Newick to have a tri-furcation e.g. (A(B,C,D) or larger polytomy (A(B(C,D,E,F,G))). There are unfortunately many slightly different ways of writing Newick.
On Fri, Aug 7, 2015 at 4:49 PM, Ross Mounce notifications@github.com wrote:
If that is implying that all trees have to be binary then no, that is not correct.
That's what it implies, and it calls itself a Validator.
It is permitted in Newick to have a tri-furcation e.g. (A(B,C,D) or larger polytomy (A(B(C,D,E,F,G))). There are unfortunately many slightly different ways of writing Newick https://en.wikipedia.org/wiki/Newick_format.
That's exactly why it's a problem. It may mean that I will have to create a STK2-specific Newick. In any case the transfer has to be validated.
So the likelihood is that we have a single file of 5000 lines with Newick in? In which case we will at some stage need a tool to summarize the CTrees and create one [1].
[1] Yes we can find/grep/cat to concatenate output, but ultimately summarisation should be done in AMI using some form of map/reduce strategy.
—
Reply to this email directly or view it on GitHub https://github.com/ContentMine/phylotree/issues/16#issuecomment-128743897 .
Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
Does this mean that we can try with a small number of trees to test whether the supertree workflow works (even if the answers are not meaningful)?
From Ross
??? Do you mean input into STK2 from ami? We just need to concatenate all the .nwk files into one big .tre file for STK2. One nwk per line in the STK2. No additional re-shaping or reformatting (provided that the taxon names have already been standardised). At most it will entail the subtraction or addition of semicolons at the end of each line.
We haven't decided where the .nwk files are in the Ctree. Since there could be >1 image there will be >1 .nwk
I have validated the Newick generated by AMI this morning. I used the command line mode of TreeGraph 2 to generate new images of the trees in .png & .svg from the .nwk files. 2195 / 2211 were successfully interpreted. Sorry I have not reported this sooner. I will get up details about the errors in the error folder on phylotree ASAP
So you will flag 16 files as errors in an issue, explain what is wrong and assign them as an issue for me?
I trust TreeGraph 2 as a validator. Some like DendroPy (Python) are useful but too strict - they throw a fit at all the unlabelled taxa, so not so useful at this stage.
That's your shout. My point is that I have to know that AMI output is valid. It sounds like some of it isn't
Just uploaded it all to https://github.com/ContentMine/phylotree/tree/master/errors/TreeGraph2-validation-tests
I have now posted a separate issue here: https://github.com/ContentMine/phylotree/issues/17 for the specific files which appear to be erroneous
There's an error in Github:
Sorry, we had to truncate this directory to 1,000 files. 7,798 entries were omitted from the list.
On Fri, Aug 7, 2015 at 5:49 PM, Ross Mounce notifications@github.com wrote:
Just uploaded it all to https://github.com/ContentMine/phylotree/tree/master/errors/TreeGraph2-validation-tests
I will post a separate issue for the specific files which appear to be erroneous
— Reply to this email directly or view it on GitHub https://github.com/ContentMine/phylotree/issues/16#issuecomment-128762299 .
Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
Are these all files in error? (.../errors/TreeGraph2-validation-tests https://github.com/ContentMine/phylotree/tree/master/errors/TreeGraph2-validation-tests )
We need a description of what these files are. They look like potential input for tests, not errors.
On Fri, Aug 7, 2015 at 6:23 PM, Peter Murray-Rust < peter.murray.rust@googlemail.com> wrote:
There's an error in Github:
Sorry, we had to truncate this directory to 1,000 files. 7,798 entries were omitted from the list.
On Fri, Aug 7, 2015 at 5:49 PM, Ross Mounce notifications@github.com wrote:
Just uploaded it all to https://github.com/ContentMine/phylotree/tree/master/errors/TreeGraph2-validation-tests
I will post a separate issue for the specific files which appear to be erroneous
— Reply to this email directly or view it on GitHub https://github.com/ContentMine/phylotree/issues/16#issuecomment-128762299 .
Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069
Provide a mechanism for validating *.nwk output.
As an example http://libpll.org/api/group__newickParseGroup.html defines a valid tree as
This implies that all multiple parentage should be expanded to binary trees apart from roots.
Is this a satisfactory validator? and does it validate node labels, etc.