OneZoom / tree-build

Scripts for assembling the tree, metadata and downstream data products such as popularity and popular images
MIT License
1 stars 2 forks source link

Missing taxa with new open tree (14.9) #29

Closed hyanwong closed 4 months ago

hyanwong commented 10 months ago

With the most recent OpenTree, v14.9, In step 3 of https://github.com/OneZoom/tree-build/blob/main/oz_tree_build/README.markdown, I'm getting some warnings:

WARNING:root:Could not find the following taxa: 4795972, 385878, 747323, 5205266, 523271

this the feeds through to step 4, which now warns:

WARNING:root:Subtree file OpenTreeParts/OpenTree_all/523271.nwk does not exist
WARNING:root:Subtree file OpenTreeParts/OpenTree_all/747323.nwk does not exist
WARNING:root:Subtree file OpenTreeParts/OpenTree_all/385878.nwk does not exist

These are likely to be taxa that have vanished from the OpenTree, e.g.

davidebbo commented 10 months ago

Just a note that these 5 taxa were also not in 14.7, so this is not new to 14.9.

davidebbo commented 10 months ago

Focusing on 385878 (Glomeromycota), it was in 12.3, but not in 13.5.

Also, in 12.3, Glomeromycota has 3 direct children:

In the latest tree, Glomeromycetes is gone. The other two still exist.

I really need better tools to compares large trees...

hyanwong commented 5 months ago

I'm just looking into this again. These are the errors we now get with the standard pipeline, on OpenTree 14.9:

WARNING:root:Could not find the following taxa: 4795972, 523271, 5205266, 385878, 747323

WARNING:root:Subtree file OpenTreeParts/OpenTree_all/523271.nwk does not exist
WARNING:root:Subtree file OpenTreeParts/OpenTree_all/747323.nwk does not exist
WARNING:root:Subtree file OpenTreeParts/OpenTree_all/385878.nwk does not exist

# The following taxa have not been correctly substituted
_LobataPlus_ott523271~-574291-1580-1583-747323@
Pleurobrachiidae_ott747323@
Glomeromycota_ott385878@
davidebbo commented 5 months ago

This looks identical to what you initially reported in this issue, no? I think the fix would be to adjust our bespoke files to stop referencing OTTs that are no longer in OpenTree.

For instance, Holomycota.PHY has:

(Mucoromycotina_ott564951@:690,Glomeromycota_ott385878@:690)Mycoromyceta:10

But 385878 is gone (see my earlier comment).

The hard part is to fix this up in a scientifically 'correct' way, which is above my head :)

@hyanwong what are your thoughts on tackling this?

hyanwong commented 5 months ago

I think I need to sped a few hours looking at the taxonomy and what's been recently fixed in the OpenTree. I'll do that this week, since my head is in the OneZoom stuff.

davidebbo commented 5 months ago

I tried to dig into Glomeromycota a bit. Based on OpenTree, all the following are no longer monophyletic (from this list):

Glomeromycota (385878)
  Glomeromycetes (265358)
    Diversisporales (385884)
    Glomerales (157948)

It appears that the conflict is caused by this 2021 paper: http://dx.doi.org/10.3389/ffunb.2021.716385:

we explored the phylogenetic relationships within Glomeromycota. Our results support family level classification from previous phylogenetic studies, and the polyphyly of the order Glomerales with Claroideoglomeraceae as the sister group to Glomeraceae and Diversisporales.

But going beyond this is above my head, so I'll stop :)

hyanwong commented 4 months ago

I've almost fixed the missing taxa: only Glomeromycota to go. I guess I'll just go with the RAxML tree from that supposedly conflicting paper:

Screenshot 2024-05-07 at 16 42 08

Seems like a bug that it's causing Glomeromycota to vanish, so I'vre reported it at https://github.com/OpenTreeOfLife/feedback/issues/589