biocore / empress

A fast and scalable phylogenetic tree viewer for microbiome data analysis
BSD 3-Clause "New" or "Revised" License
48 stars 31 forks source link

Strip whitespace around node names? #480

Open fedarko opened 3 years ago

fedarko commented 3 years ago

I was testing EMPress on a old Newick file I had lying around (representing a tree shown in this article: https://rachel53461.wordpress.com/2014/04/20/algorithm-for-drawing-trees), and I noticed something strange.

The Newick file (I think I just put this together manually when I originally read this article?) looks like this:

((a:1, (b:1, c:1)d:1)e:1, f:1, (g:1, (h:1, i:1, j:1, k:1, l:1)m:1)n:1)o:1;

Loading the tree in EMPress works as normal, but what is interesting is that searching for node f doesn't work. This is because the whitespace before f in the Newick file is actually being treated as part of this node's name, so in order to search for this node I had to type [space character goes here]f in the search bar.

Interestingly, in the selected node menu for this node, the node name is just listed as f (with no preceding whitespace). Not sure if this is an artifact of how the JS code works or an automatic thing being done by the DOM or something.

f

Anyway, long story short: it might be worth adding a step in EMPress' Python code that -- once a tree is obtained from the IOW library -- strips leading/trailing whitespace, and then continues with normal validity checks (looking for duplicate tip names, etc.) But I am not sure if this is a good way to handle this, or if whitespace in Newick is even a valid thing. I imagine this is a pretty rare corner-case, since most tree files are probably going to be computer-generated (and therefore probably won't have redundant spacing).