globalbioticinteractions / nomer

maps identifiers and names to other identifiers and names
GNU General Public License v3.0
19 stars 3 forks source link

add support for matching names to phylogeny tips provided by Upham et al. 2019 #158

Open jhpoelen opened 1 year ago

jhpoelen commented 1 year ago

As discussed with @ajacsherman et al. , we'd like to add support for matching names against phylogeny tips as published by:

Upham NS, Esselstyn JA, Jetz W. Inferring the mammal tree: Species-level sets of phylogenies for questions in ecology, evolution, and conservation. PLoS Biol. 2019 Dec 4;17(12):e3000494. doi: 10.1371/journal.pbio.3000494. PMID: 31800571; PMCID: PMC6892540.

jhpoelen commented 1 year ago

Data supplements published via:

Upham, Nathan S.; Esselstyn, Jacob A.; Jetz, Walter (2019), Inferring the mammal tree: Species-level sets of phylogenies for questions in ecology, evolution, and conservation, Dryad, Dataset, https://doi.org/10.5061/dryad.tb03d03

with 4GB zip file containing:

$ unzip -l doi_10.5061_dryad.tb03d03__v4.zip
Archive:  doi_10.5061_dryad.tb03d03__v4.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
 26162477  2023-06-21 07:48   Data_S6_patchClade_runfiles.zip
  6920653  2023-06-21 07:48   Data_S2_geneTree_files.zip
  8195052  2023-06-21 07:48   Data_S3_globalRAxML_files.zip
  4359516  2023-06-21 07:48   Data_S1_geneChecking_and_masterTaxonomy.zip
636119340  2023-06-21 07:48   Data_S8_finalFigureFiles.zip
  4520185  2023-06-21 07:48   Data_S4_patchClade_results_and_MCC.zip
3712197364  2023-06-21 07:48   Data_S7_Mammalia_credibleTreeSets_tipDR.zip
  1831991  2023-06-21 07:48   Data_S5_backboneDating_runfiles_and_MCC.zip
---------                     -------
4400306578                     8 files

with Data_S7_Mammalia_credibleTreeSets_tipDR.zip containing

$ unzip -l Data_S7_Mammalia_credibleTreeSets_tipDR.zip 
Archive:  Data_S7_Mammalia_credibleTreeSets_tipDR.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2019-09-25 15:32   Data_S7_Mammalia_credibleTreeSets_tipDR/
1273572508  2019-07-14 23:13   Data_S7_Mammalia_credibleTreeSets_tipDR/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_all10k_v2_nexus.trees
        0  2019-09-25 15:22   Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/
  3090277  2019-07-15 08:40   Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_MCC_v2_target.tre
        0  2019-09-25 15:40   __MACOSX/
        0  2019-09-25 15:40   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/
        0  2019-09-25 15:40   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/
      220  2019-07-15 08:40   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/._MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_MCC_v2_target.tre
  3144609  2019-07-15 09:25   Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_FBDasZhouEtAl_MCC_v2_target.tre
      220  2019-07-15 09:25   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/._MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_FBDasZhouEtAl_MCC_v2_target.tre
     6148  2019-09-25 15:22   Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/.DS_Store
      120  2019-09-25 15:22   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/._.DS_Store
   351745  2019-09-24 22:50   Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_FBDasZhouEtAl_MCC_v2_PLOTTED.pdf
      177  2019-09-24 22:50   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/._MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_FBDasZhouEtAl_MCC_v2_PLOTTED.pdf
   350503  2019-09-24 21:54   Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_MCC_v2_PLOTTED.pdf
      233  2019-09-24 21:54   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/DNAonly_MCCs/._MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_MCC_v2_PLOTTED.pdf
    10244  2019-09-25 15:32   Data_S7_Mammalia_credibleTreeSets_tipDR/.DS_Store
      120  2019-09-25 15:32   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/._.DS_Store
1852405303  2019-07-15 00:19   Data_S7_Mammalia_credibleTreeSets_tipDR/MamPhy_fullPosterior_BDvr_Completed_5911sp_topoCons_NDexp_all10k_v2_nexus.trees
1881543473  2019-07-15 11:07   Data_S7_Mammalia_credibleTreeSets_tipDR/MamPhy_fullPosterior_BDvr_Completed_5911sp_topoCons_FBDasZhouEtAl_all10k_v2_nexus.trees
1298801361  2019-07-14 23:08   Data_S7_Mammalia_credibleTreeSets_tipDR/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_FBDasZhouEtAl_all10k_v2_nexus.trees
        0  2019-09-25 15:32   Data_S7_Mammalia_credibleTreeSets_tipDR/Completed_tipDR_all10k/
  1362126  2019-09-24 08:38   Data_S7_Mammalia_credibleTreeSets_tipDR/Completed_tipDR_all10k/DR-SUMMARY_MamPhy_BDvr_Completed_5911sp_topoCons_FBDasZhouEtAl_all10k_v2_expanded.txt
        0  2019-09-25 15:46   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/Completed_tipDR_all10k/
      176  2019-09-24 08:38   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/Completed_tipDR_all10k/._DR-SUMMARY_MamPhy_BDvr_Completed_5911sp_topoCons_FBDasZhouEtAl_all10k_v2_expanded.txt
1065782805  2019-09-23 19:13   Data_S7_Mammalia_credibleTreeSets_tipDR/Completed_tipDR_all10k/DR-matrix_MamPhy_BDvr_Completed_5911sp_topoCons_NDexp_all10k_v2.txt
1062985577  2019-09-24 02:51   Data_S7_Mammalia_credibleTreeSets_tipDR/Completed_tipDR_all10k/DR-matrix_MamPhy_BDvr_Completed_5911sp_topoCons_FBDasZhouEtAl_all10k_v2_prune5911.txt
      176  2019-09-24 02:51   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/Completed_tipDR_all10k/._DR-matrix_MamPhy_BDvr_Completed_5911sp_topoCons_FBDasZhouEtAl_all10k_v2_prune5911.txt
  1368273  2019-09-23 21:57   Data_S7_Mammalia_credibleTreeSets_tipDR/Completed_tipDR_all10k/DR-SUMMARY_MamPhy_BDvr_Completed_5911sp_topoCons_NDexp_all10k_v2_expanded.txt
      176  2019-09-23 21:57   __MACOSX/Data_S7_Mammalia_credibleTreeSets_tipDR/Completed_tipDR_all10k/._DR-SUMMARY_MamPhy_BDvr_Completed_5911sp_topoCons_NDexp_all10k_v2_expanded.txt
---------                     -------
8444776570                     30 files

from which the following files appear to contain nexus trees of sorts -

$ unzip -l Data_S7_Mammalia_credibleTreeSets_tipDR.zip  | grep nexus
1273572508  2019-07-14 23:13   Data_S7_Mammalia_credibleTreeSets_tipDR/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_all10k_v2_nexus.trees
1852405303  2019-07-15 00:19   Data_S7_Mammalia_credibleTreeSets_tipDR/MamPhy_fullPosterior_BDvr_Completed_5911sp_topoCons_NDexp_all10k_v2_nexus.trees
1881543473  2019-07-15 11:07   Data_S7_Mammalia_credibleTreeSets_tipDR/MamPhy_fullPosterior_BDvr_Completed_5911sp_topoCons_FBDasZhouEtAl_all10k_v2_nexus.trees
1298801361  2019-07-14 23:08   Data_S7_Mammalia_credibleTreeSets_tipDR/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_FBDasZhouEtAl_all10k_v2_nexus.trees

@n8upham - which resource should I use to have nomer map taxonomic names to their equivalent phylogenetic trees?

myrmoteras commented 1 year ago

there is an effort among GBIF and phylogeny specialists, et open tree of life, to do this and make them accessible for use in GBIF and beyond:

jhpoelen commented 1 year ago

@myrmoteras thanks for sharing that GBIF and phylogeny specialists are working on linking specimen to their associated phylogenies. Can you point to the methods they use / or intent do use? Who's working on it? Where do they keep their source code?

jhpoelen commented 1 year ago

@n8upham pointed to

https://github.com/n8upham/MamPhy_v1/blob/master/_DATA/taxonomy_mamPhy_5911species_toPublish.csv

to use for taxonomic alignment with Upham et al. 2019 mammal phylogeny.

jhpoelen commented 1 year ago

@n8upham also https://vertlife.org/data/mammals/ is the easiest way to get the consensus tree. with related pubs https://github.com/n8upham/MamPhy_v1/blob/master/_DATA/MamPhy_fullPosterior_BDvr_DNAonly_4098sp_topoFree_NDexp_MCC_v2_target.tre

and https://github.com/n8upham/MamPhy_v1/tree/master

n8upham commented 1 year ago

More details on how to align taxonomy file from the Mammalia phylogeny of Upham et al. 2019 (MamPhy v1.0) to the Bat Taxonomic Alignment

  1. Go to this file: https://github.com/n8upham/MamPhy_v1/blob/master/_DATA/taxonomy_mamPhy_5911species_toPublish.csv

  2. Recommend doing the following to add this taxonomy to the BTA: subset by "ord" = "CHIROPTERA"

ajacsherman commented 1 year ago

Hi! I won't be able to dive into this until early next week, but I'm not sure if we want to match our taxonomy to the tree or just add character states represented by a pie chart or symbol at the terminal nodes? I don't think it matters that we have blank spaces at the terminal nodes for those taxa not represented. It might be better to have the blanks so we can visualize gaps in our data. If I remember correctly, constructing a partial tree will take a significant amount of time that could be used in a more constructive way. I value your input, so please let me know if this is a step you feel strongly about. I realize we need to separate out Chiroptera from the rest of the mammal tree, but narrowing down to our exact list of taxa might be unnecessary. I have been out of the tree generating world since 2014. Do we know if software outside of PAUP etc. is accurate enough to produce trees for publication purposes? I assume Jorrit, you are asking in order to produce trees with a software add-on? This might need a meeting. Also, I'm very close to being done resolving for taxonomic names, so we will have accurate valid names associated with the source taxa in the next hour or so. Also, do we want to wait until I ingest the newest version of MDD names into the BTA so we aren't working off an old version? Thank you in advance, Aja

On Wed, Aug 30, 2023 at 12:22 PM Nate Upham @.***> wrote:

More details on how to align taxonomy file from the Mammalia phylogeny of Upham et al. 2019 (MamPhy v1.0) to the Bat Taxonomic Alignment

1.

Go to this file: https://github.com/n8upham/MamPhy_v1/blob/master/_DATA/taxonomy_mamPhy_5911species_toPublish.csv 2.

Recommend doing the following to add this taxonomy to the BTA: subset by "ord" = "CHIROPTERA"

-

do an automated match to the BTA (I would use "left_join()" in the R dplyr package, but there are many ways to do this)

Keep all columns of the MamPhy taxonomy -- it is the "tiplabel" column you will need to interact with the phylogenies themselves

for those names that don't match

  • In BTA, not MamPhy
    • add entry for which BTA species (of which taxonomy) that species is likely represented by in the MamPhy phylogeny
    • can use the "MSW3_sciName_matched" column in MamPhy taxonomy to assess if which MamPhy name matches to MSW3 (or if the name differs since MSW3)
      • In MamPhy, not BTA
    • add row in the BTA (but perhaps this case doesn't exist?)

— Reply to this email directly, view it on GitHub https://github.com/globalbioticinteractions/nomer/issues/158#issuecomment-1699486199, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXI3CB7E7T62KUFPXS4OIRDXX5SCXANCNFSM6AAAAAAZO7HOBA . You are receiving this because you were mentioned.Message ID: @.***>

-- Aja Sherman MS Bat Eco-Interactions Database Curator 914-886-8906 @.*** she/her