arklumpus / TreeViewer

Cross-platform software to draw phylogenetic trees
GNU Affero General Public License v3.0
193 stars 9 forks source link

Odd separator detection when importing data from attachment #37

Open Ge94 opened 5 months ago

Ge94 commented 5 months ago

Hi, thank you for this resource! It's quite useful and intuitive for most things. I really appreciated the examples in your wiki.

I would like to report an odd case. I wrote an attachment in the following format:

TaxID  Color
1117    #999933
1118    #999933
...

The separator is a simple \t. I wanted to pass the attachment through the "parse node states" module, but no data were originally detected even by specifying \t as a separator (regardless of the regex box being ticked or not). After visualising the attachment in the spreadsheet editor, I discovered that the detected regex was [\t ]+ (edit to add: I just noticed an extra white space after the tab in the spreadsheet editor). I am not sure whether this is a standard behaviour, I am working on a Linux distro. Could you please double check this?

As a side recommendation, when studying the different examples I noticed that slightly different things are shown in different wikis. For example, when investigating the "parse node states" function here and here I learnt different things and in particular, I read about the spreadsheet trick when I had already spent quite a while thinking about my issue. Could I suggest taking a look at sections that are kind of repeated throughout the wiki, and provide the same info everywhere?

Thank you again for this cool visualisation tool.

arklumpus commented 5 months ago

Hi, thank you for your interest in TreeViewer!

That extra white space after the tab is probably the key! The most likely explanation is that, in your attachment file, you have some columns with tabs and some with spaces (even just one space may be enough).

I can't be sure based on the snippet you posted (because I think GitHub converted all tabs to spaces), but it should be easy to check if you can attach the full file. Here, I can see only two spaces between there TaxID and Color, while there are four in the following rows, but I don't know if this reflects the actual file. However, did you try with the default separator (\s)? This should actually work because it matches both tabs and spaces.

Apologies if the style of some tutorials can be a bit confusing - as you can imagine, they were written at different times, when some features did not exist yet, so they sometimes show different ways of doing the same thing...

Ge94 commented 5 months ago

Hi! Thank you for taking the time for checking my issue.

Sorry, I might have mislead you - I typed the first lines of my file in the github editor myself just to explain the format. I am attaching it here for completion. Unfortunately, the default separator was not detecting anything... annotationFile.data.txt

arklumpus commented 5 months ago

Thank you for sending along the file! It looks mostly fine to me, but a problem I see is that there are some duplicate taxids (265, 356, and 976). You should notice that this causes an error in TreeViewer, because a warning icon appears in the status bar:

image

And if you click on this icon, you get a description of the problem:

image

After fixing this (i.e., removing the duplicate taxids), the file works fine for me even with the default separator... I'm not sure if there might be something else going on in your plot (like the taxids being stored as strings rather than numbers). If you would like to send the full tree file, it would help to diagnose the problem.