emmanuelparadis / ape

analysis of phylogenetics and evolution
http://ape-package.ird.fr/
GNU General Public License v2.0
52 stars 11 forks source link

read.tree() silently (but correctly) fails on malformed Newick file #114

Closed arcresu closed 5 months ago

arcresu commented 5 months ago

If read.tree() encounters a Newick file that is missing its trailing semicolon, it silently returns NULL thanks to this line: https://github.com/emmanuelparadis/ape/blob/9d03d1fc37f5de765fd3f2464037dc183ed74b66/R/read.tree.R#L73

Would it be possible to add a warning("trailing semicolon not found") or similar before returning here? I had a tree with this issue and it opened fine in some other tree viewers so it took some debugging to figure out why ape couldn't read it. Thanks!

emmanuelparadis commented 5 months ago

Thanks for reporting this. Looking at the code, maybe this could be tested a few lines above: https://github.com/emmanuelparadis/ape/blob/9d03d1fc37f5de765fd3f2464037dc183ed74b66/R/read.tree.R#L58 The test would then be:

if (all(y == -1))  {
     warning("no semicolon(s) [end(s) of tree] found")
    return(NULL)
}

I had a tree with this issue and it opened fine in some other tree viewers

I guess these viewers would not work if the Newick file has more than one tree. read.tree() can work with many trees, so the use of ; is critical. Cheers, E.

arcresu commented 5 months ago

Thanks, yes that sounds sensible.

I'm coming from a context where it's unusual to have more than one tree per file and some tools aren't careful with the semicolons. iTOL is one tree viewer that copes with these slightly malformed one-tree Newick files. I don't see any problem with read.tree() aborting since it expects to handle multiple trees, but the warning might save someone some debugging time in future.

KlausVigo commented 5 months ago

Hi @arcresu, might be also useful to inform the authors of the program you got the tree from that they should add a semicolon. I did this for a program called GRASP recently. Kind regards, Klaus

Thanks, yes that sounds sensible.

I'm coming from a context where it's unusual to have more than one tree per file and some tools aren't careful with the semicolons. iTOL is one tree viewer that copes with these slightly malformed one-tree Newick files. I don't see any problem with read.tree() aborting since it expects to handle multiple trees, but the warning might save someone some debugging time in future.

emmanuelparadis commented 5 months ago

Fixed and pushed here (with updated date).