Open josephwb opened 9 years ago
Can you post the trees somewhere? Also note that a recent commit added a - between the words in the tools dir of otc. So these commands will be otc-prune-... and otc-find-un...
Hmm. I don't see that taxon in either the fungi.synth.tre
that you posted or the Fungi_taxonomy.tre
that you posted.
I do see it in the Pruned_Fungi_taxonomy.tre
I believe that otc-find-unsupported-nodes
assumes that the leaf set of the first 2 trees (the taxonomy and the synthetic tree) are identical. So perhaps the lack of that taxon in fungi.synth.tre is causing this.
Ok, maybe that was an old file.
Ok, yeah, I just included the wrong files in the link above. This one should work (or, er, shouldn't work. As expected, that is.).
grep 4085684 Unfiltered_Fungi_taxonomy.tre
# TRUE
grep 4085684 fungi.synth.tre
# FALSE
otcprunetaxonomy Unfiltered_Fungi_taxonomy.tre fungi.synth.tre > Pruned_Fungi_taxonomy.tre
grep 4085684 Pruned_Fungi_taxonomy.tre
# TRUE
4085684 is not in fungi.synth.tre, but that tree does have a tip that is mapped to the parent of 4085684. I think that the prune taxonomy expands all non-terminal taxa that are assigned to tips to the full set of terminal taxa below them:
$ grep -o -P '.{0,40}529465.{0,20}' fungi.synth.tre
choascus_ott4085916,Phaffomycetaceae_ott529465,((Clavispora_opunti
$ grep -o -P '.{0,20}4085684.{0,20}' Unfiltered_Fungi_taxonomy.tre
31290,ott4931289,ott4085684)ott529465,((ott4085
I think that is why this tip is not getting pruned.
So, that won't work, right? Here I pruned by synth tree and inputs:
otcprunetaxonomy Unfiltered_Fungi_taxonomy.tre fungi.synth.tre fungitrees/*.tre > Pruned_Fungi_taxonomy_all-inputs.tre
Still contains the taxon:
grep 4085684 Pruned_Fungi_taxonomy_all-inputs.tre
# TRUE
When I run otcfindunsupportednodes
, it terminates as above because the synth tree does not contain the taxon.
BTW, the error reported above (note: the problematic taxon is different, because I used a different taxonomy tree):
josephwb@WOPR:~/Desktop/for_MTH2$ otcfindunsupportednodes Pruned_Fungi_taxonomy_all-inputs.tre fungi.synth.tre fungitrees/*.tre
2015-03-30 10:04:02,475 INFO [default] reading "Pruned_Fungi_taxonomy_all-inputs.tre"...
2015-03-30 10:04:12,415 INFO [default] reading "fungi.synth.tre"...
2015-03-30 10:04:19,975 INFO [default] reading "fungitrees/ott1010493.tre"...
2015-03-30 10:04:19,975 INFO [default] reading "fungitrees/ott1026597.tre"...
2015-03-30 10:04:19,977 INFO [default] reading "fungitrees/ott103001.tre"...
2015-03-30 10:04:19,978 INFO [default] reading "fungitrees/ott103002.tre"...
2015-03-30 10:04:19,983 INFO [default] reading "fungitrees/ott1031212.tre"...
2015-03-30 10:04:19,984 INFO [default] reading "fungitrees/ott104185.tre"...
ERROR. Exiting due to an exception:
OTT id not found 222914
is a little confusing: it dies in the middle of processing the inputs (i.e. not all inputs are lited above; many more to go). It seems like file "ott1098854.tre" is the problem, but I believe it is simply that "fungi.synth.tre" does not contain the taxon that is present in "Pruned_Fungi_taxonomy_all-inputs.tre". It seems like it starts processing the input trees before it has decided there is a conflict between the taxonomy and synth trees.
Anyway, unless I am off my rocker, this is not working as desired (i.e. to produce files condusive to downstream testing).
Just a follow-up note on this.
The key to avoiding this problem (for me) is to first filter (currently using python) taxonomy by what treemachine skips (so that the taxonomy should contain the same tip set as the synthetic tree). Problem is, especially for the proverbial Joe-Shmoe, what treemachine actually uses to filter is not immediately obvious (e.g.).
So, yeah, it would be great if this method threw out such problematic taxa as above using just the inputs, so that a user wouldn't have to be aware of whatever taxonomy pruning has occurred elsewhere.
I'll certainly leave this open as a feature request. the otc-uncontested-decompose and otc-find-unsupported-nodes were written assuming that the leaf set of the taxonomy and the leaf set of the synth tree are the same. Shouldn't be too hard to deal with these non-terminal cases internally. But I'm afraid that I won't get to it soon. So the workaround of using a taxonomy and synth tree with the same leaf set will have to do for the time being (unless someone else wants to fix this).
It is far from complete, but I have been working on documenting the thinking behind otcetera in the doc subdir (pdf version posted http://phylo.bio.ku.edu/ot/summarizing-taxonomy-plus-trees.pdf )
Agreed. Not really an issue for me now that I've got an appropriately filtered taxonomy.
I'm trying to test sub-synth trees for unsupported edges, but am having problems with "missing" OTT IDs (specifically, an OTT ID present in the taxonomy, but not the synth tree or input trees).
What I am doing: First: prune the taxonomy tree using the synthetic tree label set:
Second: run the unsupported edge test:
The OTT ID 4085684 appears both in my original "Fungi_taxonomy.tre" and otcetera-generated "Pruned_Fungi_taxonomy.tre". The taxon itself is barren. I can filter such "dubious" taxa from my taxonomy tree, but I thought
otcprunetaxonomy
would accomplish this.