CatalogueOfLife / testing

Editorial tests and discussion to prepare for COL releases
2 stars 0 forks source link

Implementation of new management classification in Vertebrata (Bailly & al., 2022). #186

Open yroskov opened 2 years ago

yroskov commented 2 years ago

TG meeting 2022-03-02, action point: Vertebrata group, upload recommended classification to ChecklistBank.

2022-03-09, ZOOM meeting with @NicBailly.

Final version: COL_Vertebrata_Classification_v5_20220105_sentTG.xlsx

Phylum Chordata |   |   |   |   |   |   |   |   -- | -- | -- | -- | -- | -- | -- | -- | --   | Subphylum Cephalochordata |   |   |   |   |   |   |     | Subphylum Tunicata (=Urochordata) |   |   |   |   |   |   |     | Subphylum Vertebrata (= Craniata) |   |   |   |   |   |   |     |   | Infraphylum Agnatha |   |   |   |   |   |     |   |   |   |   |   | Superclass Cyclostomi |   |     |   |   |   |   |   |   | Class Myxini |     |   |   |   |   |   |   | Class Petromyzonti |     |   | Infraphylum Gnathostomata |   |   |   |   |   |     |   |   | **Parvphylum** Chondrichthyes |   |   |   |   |     |   |   |   |   |   |   | Class Elasmobranchii |     |   |   |   |   |   |   |   | Subclass Neoselachii   |   |   |   |   |   |   |   |     |   |   |   |   |   |   |   |     |   |   |   |   |   |   | Class Holocephali |     |   |   | **Parvphylum** Osteichthyes |   |   |   |   |     |   |   |   | **Gigaclass** Actinopterygii |   |   |   |     |   |   |   |   |   |   | Class Cladistii |     |   |   |   |   |   |   | Class Chondrostei |     |   |   |   |   |   |   | Class Holostei |     |   |   |   |   |   |   |   | Subclass Gynglimodi   |   |   |   |   |   |   |   | Subclass Halecomorphi   |   |   |   |   |   |   | Class Teleostei |     |   |   |   | **Gigaclass** Sarcopterygii |   |   |   |     |   |   |   |   | Unranked placeholder |   |   |     |   |   |   |   |   |   | Class Coelacanthi |     |   |   |   |   |   |   | Class Dipneusti |     |   |   |   |   | Tetrapoda |   | Tetrapoda |     |   |   |   |   |   | Superclass placeholder |   |     |   |   |   |   |   |   | Class Amphibia |     |   |   |   |   |   |   |   |     |   |   |   |   |   | Superclass Amniota |   |     |   |   |   |   |   |   | Class Squamata |     |   |   |   |   |   |   | Class Testudines |     |   |   |   |   |   |   | Class Crocodilia |     |   |   |   |   |   |   | Class Sphenodontia |     |   |   |   |   |   |   | Class Aves |     |   |   |   |   |   |   | Class Mammalia |  
yroskov commented 2 years ago

Affected 4 taxa: Parvphylum Chondrichthyes & Parvphylum Osteichthyes; Gigaclass Actinopterygii & Gigaclass Sarcopterygii

2022-03-09: entered as "Unranked".

Correct ranks should be applied after a fix of issue 1120.

yroskov commented 2 years ago

Classification as implemented, 2022-03-09:

(In gigaclass Sarcopterygii two children with a name "Unranked placeholder" were not established; child "Tetrapoda" was established as "Unranked")

image image image

yroskov commented 2 years ago
yroskov commented 2 years ago
Bailly, 2022 ReptileDB Sectors now (children)
Class Squamata Order Squamata 15 superfamilies and 15 unplaced families
Class Testudines Order Testudines 2 suborders Cryptodira & Pleurodira
Class Crocodylia Order Crocodylia 1 suborder Eusuchia
Class Sphenodontia Order Rhynchocephalia - Suborder Sphenodontida - Family Sphenodontidae 1 family Sphenodontidae
yroskov commented 2 years ago
Bailly, 2022 FishBase of 2018 Sectors now (children) Assembled without help 2022-03-17
unranked (gigaclass) Actinopterygii with 4 classes class Actinopterygii Need @NicBailly assistance All orders from former class Actinopterygii placed in a root of unranked Actinopterygii
? class Petromyzonti class Cephalaspidomorphi - order Petromyzontiformes Need @NicBailly assistance order Petromyzontiformes placed in class Petromyzonti
class Elasmobranchii class Elasmobranchii established 1to1
class Holocephali class Holocephali established 1to1
class Myxini class Myxini established 1to1
unranked (gigaclass) Sarcopterygii with 2 classes class Sarcopterygii Need @NicBailly assistance 3 orders from former class Sarcopterygii placed in unranked Sarcopterygii
yroskov commented 2 years ago

Synced 2022-03-17 for CoL of March 2022

yroskov commented 2 years ago

Email of 2022-05-04

...we did not finalized vertebrate classification in CoL yet. Assignment in the gigaclass Actinopterygii need to be completed. As we agreed, I can do final steps if you send me a spreadsheet with taxa in gigaclass Actinopterygii extended down to FishBase orders. Yours, Yuri

As left unresolved: image image image image

yroskov commented 2 years ago

2022-06-20: Because some suborders in Actinopterygii have been raised to order rank now, we cannot easily translate FishBase checklist of Feb 2018 into proposed classification. Solution agreed with Nicolas: delete four empty classes in gigaclass Actinopterygii; all "old" orders remain children of the gigaclass. Implemented.

Classification will stay like that until a new data in CoLDP will be available for CoL - after July 2022(?)

yroskov commented 2 years ago

2022-08-16: Classification of Sarcopterygii is displayed in the tree with an impression of broken hierarchy "class-unranked-superclass-class": image

Two ways to resolve the case of "unranked" groups which software cannot handle correctly:

1) use artificial group "Unrankedplaceholder" as a sister to unranked Tetrapoda, a child of Gigaclass Sarcopterygii with two classes Coelacanthi & Dipneusti in it. Exactly as in Nicolas' spreadsheet: image

or 2) change rank for Tetrapoda from "unranked" to "megaclass" (ref. to WoRMS 2022-08-16: image

Markus prefer to follow 2nd root.

Implemented:

image

mdoering commented 1 year ago

@yroskov @dhobern is there still work to be done for the (not so) new Vertebrate classification or can we close the issue?

There are missing major ranks in the classification although we have a very rich set of ranks. I feel this is wrong and there should for example be at least regular orders if there are superfamilies and super/mega/gigaclass ranks: See https://github.com/CatalogueOfLife/data/issues/484

On the other hand we miss useful intermediate ranks like superfamilies or suborders for birds: https://github.com/CatalogueOfLife/data/issues/207

dhobern commented 1 year ago

Yes. I'm the main bottleneck now. This week, I'm in all day meetings every day and also have 50 CVs I need to read so it will probably not be until next week.

Donald

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg


From: Markus Döring @.> Sent: Monday, September 11, 2023 5:24:33 PM To: CatalogueOfLife/testing @.> Cc: Donald Hobern @.>; Mention @.> Subject: Re: [CatalogueOfLife/testing] Implementation of new management classification in Vertebrata (Bailly & al., 2022). (Issue gbif/portal-feedback#186)

@yroskovhttps://github.com/yroskov @dhobernhttps://github.com/dhobern is there still work to be done for the (not so) new Vertebrate classification or can we close the issue?

— Reply to this email directly, view it on GitHubhttps://github.com/CatalogueOfLife/testing/issues/186#issuecomment-1713314671, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGHP4ZVJPBGT25PBV5ZA7P3XZ24DDANCNFSM5QKSPSVA. You are receiving this because you were mentioned.Message ID: @.***>

NicBailly commented 1 year ago

@dhobern @mdoering @yroskov ( and @NicBailly to be forwarded to Tom Orrell) [Also consultations with Peter Uetz are needed for Reptiles, and Dave N. for the other Tetrapoda as they are provided by ITIS].

Topics to confirm and/or answer:

And sorry to ask the following question again, but I have hard time to know/understand with certainty the answer so far, my bad ;-).

mdoering commented 1 year ago

... is there currently (meaning a functionality that is running now on the web and in the exports of COL) a functionality that shows / exports a simplified classification with the 7 so-called Linnean ranks (or a customization, in particular with some subranks, e.g., subclass, suborder, ... )?

No there isn't. I don't think there are major obstacles in developing it, but given limited resources I would not think there is any time to implement that this year at least.

Implementation of the ordering of children is also not fully implemented, but this is sth I expect to become available rather quickly as most of it is in place already.

NicBailly commented 1 year ago

Thank you @mdoering for these precisions! @dhobern @yroskov So we should be able to take the decisions quickly now when Donald will have finished what he planned.

dhobern commented 1 year ago

I have python scripts that will do this for any COLDP, regardless of whether it uses Name, Taxon and Synonym or NameUsage, TSV or CSV. It will sort the classification and export any subset of the classification ranks.

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg


From: Markus Döring @.> Sent: Tuesday, September 12, 2023 7:33:31 PM To: CatalogueOfLife/testing @.> Cc: Donald Hobern @.>; Mention @.> Subject: Re: [CatalogueOfLife/testing] Implementation of new management classification in Vertebrata (Bailly & al., 2022). (Issue gbif/portal-feedback#186)

... is there currently (meaning a functionality that is running now on the web and in the exports of COL) a functionality that shows / exports a simplified classification with the 7 so-called Linnean ranks (or a customization, in particular with some subranks, e.g., subclass, suborder, ... )?

No there isn't. I don't think there are major obstacles in developing it, but given limited resources I would not think there is any time to implement that this year at least.

Implementation of the ordering of children is also not fully implemented, but this is sth I expect to become available rather quickly as most of it is in place already.

— Reply to this email directly, view it on GitHubhttps://github.com/CatalogueOfLife/testing/issues/186#issuecomment-1715359784, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGHP4ZT5OMHGGXMWPCXIML3X2AT6XANCNFSM5QKSPSVA. You are receiving this because you were mentioned.Message ID: @.***>

dhobern commented 1 year ago

I'd like us to get some resolution here. Here is the current classification of Chordata from COL down to family rank:

dataset-267522.txt

I've written some code to filterthis with any subset of ranks. I'll add subsequent comments exploring what we find.

To use the code, put it in a file called subset.py, place it in the same folder as the Chordata tree above and type "python subset.py help" to get instructions. Basically, replace "help" with a series of space-separated integer values in the range 0 (unranked Biota) to 16 (family) to get a subset just for those ranks.

import re
import sys

ranks = [
    "unranked",
    "kingdom",
    "phylum",
    "subphylum",
    "infraphylum",
    "parvphylum",
    "gigaclass",
    "megaclass",
    "superclass",
    "class",
    "subclass",
    "infraclass",
    "order",
    "suborder",
    "infraorder",
    "superfamily",
    "family",
]

subset = [1, 2, 9, 12, 16]
show_help = False

if len(sys.argv) > 1:
    try:
        subset = sorted([int(s) for s in sys.argv[1:]])
        if subset[0] < 0 or subset[-1] >= len(ranks):
            show_help = True
    except:
        show_help = True

if show_help:
    print(f"\nUsage: python {sys.argv[0]} <series of integers in range 0 to {len(ranks) - 1}>\n\nValues:\n")
    for i, s in enumerate(ranks):
        print(f"  {i:>2}: {s.capitalize()}")
    print("")
    exit(1)

labels = [f"[{ranks[i]}]" for i in subset]

pattern = re.compile(r"^ *")
with open("dataset-267522.txt", encoding='utf8', newline='') as f:
    for line in f:
        for l in labels:
            if l in line:
                print(line.rstrip().replace("  ", ". "))
                break
dhobern commented 1 year ago

First, selecting all ranks down from kingdom down to infraclass (python subset.py 1 2 3 4 5 6 7 8 9 10 11), we get:

. Animalia [kingdom]
. . Chordata [phylum]
. . . Cephalochordata [subphylum]
. . . . Leptocardii [class]
. . . Tunicata [subphylum]
. . . . Appendicularia [class]
. . . . Ascidiacea Blainville, 1824 [class]
. . . . Thaliacea Van der Haeven, 1850 [class]
. . . Vertebrata [subphylum]
. . . . Agnatha [infraphylum]
. . . . . Cyclostomi [superclass]
. . . . . . Myxini [class]
. . . . . . Petromyzonti [class]
. . . . Gnathostomata [infraphylum]
. . . . . Chondrichthyes [parvphylum]
. . . . . . Elasmobranchii [class]
. . . . . . Holocephali [class]
. . . . . Osteichthyes [parvphylum]
. . . . . . Actinopterygii [gigaclass]
. . . . . . Sarcopterygii [gigaclass]
. . . . . . . Coelacanthi [class]
. . . . . . . Dipneusti [class]
. . . . . . . Tetrapoda [megaclass]
. . . . . . . . Amniota [superclass]
. . . . . . . . . Aves [class]
. . . . . . . . . Crocodylia [class]
. . . . . . . . . Mammalia Linnaeus, 1758 [class]
. . . . . . . . . . Prototheria Gill, 1872 [subclass]
. . . . . . . . . . Theria Parker & Haswell, 1897 [subclass]
. . . . . . . . . . . Eutheria Gill, 1872 [infraclass]
. . . . . . . . . . . Metatheria Huxley, 1880 [infraclass]
. . . . . . . . . Sphenodontia [class]
. . . . . . . . . Squamata [class]
. . . . . . . . . Testudines [class]
. . . . . . . . Amphibia [class]

I think we should certainly make sure that this classification can be accessed (via a comment or link) wherever we display higher chordate classification, but for most purposes the phylogenetic information included seems excessive for a species list.

I'll follow the principle that losing low-information nodes and elevating their children in their place simply removes phylogenetic detail. It does not imply anything we would consider false. However, while we keep a higher node like Gnathostomata, Osteichthyes or Sarcopterygii, Tetrapoda and its children must remain nested underneath them.

In other words, the following would be perfectly valid, especially if we add a suitable sequence for the classes:

Animalia [kingdom]
. Chordata [phylum]
. . Leptocardii [class]
. . Appendicularia [class]
. . Ascidiacea Blainville, 1824 [class]
. . Thaliacea Van der Haeven, 1850 [class]
. . Myxini [class]
. . Petromyzonti [class]
. . Elasmobranchii [class]
. . Holocephali [class]
. . Coelacanthi [class]
. . Dipneusti [class]
. . Aves [class]
. . Crocodylia [class]
. . Mammalia Linnaeus, 1758 [class]
. . Sphenodontia [class]
. . Squamata [class]
. . Testudines [class]
. . Amphibia [class]

Subclasses and infraclasses only appear under Mammalia, so we can leave those as a choice for the mammal GSD to decide.

The ragged diagonal left edge to this view shows clearly that most of the intermediate ranks relate to making sure that tetrapods are shown in a phylogenetically reasonable way.

The three subphyla seem fine and unconfusing to me - I'd keep these the way they are rather than losing Chordata.

The gigaclasses highlight the ray-finned fishes and the lineage that leads to tetrapods, but the children of Actinopterygii are all orders (no classes), so it could easily just be shown as a class. We could then lose Sarcopterygii.

Of the two superclasses, Cyclostomi is effectively a synonym for Agnatha in our classification and Amniota only exists to carve off Amphibia. I suggest we lose this as a displayed rank.

If we make those changes, we get a clearer hierarchy for most users:

Animalia [kingdom]
. Chordata [phylum]
. . Cephalochordata [subphylum]
. . . Leptocardii [class]
. . Tunicata [subphylum]
. . . Appendicularia [class]
. . . Ascidiacea Blainville, 1824 [class]
. . . Thaliacea Van der Haeven, 1850 [class]
. . Vertebrata [subphylum]
. . . Agnatha [infraphylum]
. . . . . Myxini [class]
. . . . . Petromyzonti [class]
. . . Gnathostomata [infraphylum]
. . . . Chondrichthyes [parvphylum]
. . . . . Elasmobranchii [class]
. . . . . Holocephali [class]
. . . . Osteichthyes [parvphylum]
. . . . . . Actinopterygii [class]
. . . . . . Coelacanthi [class]
. . . . . . Dipneusti [class]
. . . . . . Tetrapoda [megaclass]
. . . . . . . Aves [class]
. . . . . . . Crocodylia [class]
. . . . . . . Mammalia Linnaeus, 1758 [class]
. . . . . . . Sphenodontia [class]
. . . . . . . Squamata [class]
. . . . . . . Testudines [class]
. . . . . . . Amphibia [class]

We could lose more ranks and make it simpler. This however seems to me personally to be a reasonable compromise, highlighting only the jawless/jawed, cartilaginous/bony, and ray-finned/coelacanth/lungfish/tetrapod divisions.

@NicBailly @mdoering Thoughts?

dhobern commented 1 year ago

So these would be my suggestions:

  1. We add a dataset to CLB that we can reference and that includes just the more phylogenetically complete representation of the Chordata shown above (with however much improvement we want to make via better sequencing).
  2. We simplify the version in the COL Checklist to the simpler subset at the end of my last comment, again with better sequencing.
  3. We add a displayable remark at least to COL Vertebrata and all its children down to class rank that indicates the phylogeny is simplified and that a fuller version can be viewed at the link for the CLB dataset in bullet 1.

Right now, the sequencing does not affect the display in CLB and COL. I have an open defect on this. Once it is fixed, this view should no longer be alphabetical: https://www.checklistbank.org/dataset/55434/classification

We also need a way to attach special remarks to nodes in checklist datasets so these get displayed on relevant views in COL and CLB. I'd like this to be two-fold - an actual information box at the top of the COL taxon page for the taxon in question (e.g. on https://www.catalogueoflife.org/data/taxon/8V4V3 for Vertebrata) and some kind of clickable or hoverable icon next to nodes in the browse view.

@mdoering Any thoughts on how we could do this?

@mdoering - please also look at that Vertebrata link - are the mysterious Other / Unknown labels on the pie chart (all of which lead to the dodo) an already reported issue?

mdoering commented 1 year ago

1 that works but I am slightly worried we will create sth that will be out of sync with the main project quickly. Keeping 2 places up to date manually is unlikely to work. We could use the new dataset as a source to also sync the vertebrate classification from. But then we need to reattach all sectors - which I guess isnt too bad. At least way less work than keeping 2 datasets updated. In sector settings you can specify which ranks you want to include, so all others would be left out and the tree flattened as you did with your code.

2 simplifying the COL tree I think can be done with the API (maybe even UI) that allows to delete a taxon while attaching all its children to its parent, so nothing gets lost.

3 sounds good. How exactly the comments are added is the question. We have an unstructured remarks field, but that might not be the best solution. Most importantly we'd need to know if this only applies to the management classification taxa which we do not source from a GSD. Those should be stable and we can update them. GSD taxa will not and would need a system like we have for species estimates - which we seem to have forgotten about in the past years. There is also the link attribute which could be used to refer to the phylo version.

mdoering commented 1 year ago

Ive uploaded your classification here and used it as a sector in a test project of mine where I remove a few class ranks. It seems you flexibly want to suppress certain names, not all usages of a given rank? This is also possible via decisions. I think that would work quite nice as it also documents how you get from the full tree to the reduced one in COL.

image
dhobern commented 1 year ago

Thanks, @mdoering. I agree that there is a risk of drift between a reference phylogenetic classification and the management classification. Decision-based reduction would be fine if it scales well enough.

I need to know from @NicBailly whether the various intermediate rank names are all well-established and fixed for these names. With some small adjustments, we could probably have a list of ranks to exclude in the default view.

Right now, the main challenge is Actinopterygii which is given gigaclass rank (presumably so it looks tidy next to Sarcopterygii which has to be above the rank of class). If Actinopterygii is converted to a class, we would be close to getting this to work simply by suppressing some ranks.

@mdoering Also - please look at the issue I noted above about the Vertebrata pie chart, https://www.catalogueoflife.org/data/taxon/8V4V3 - lots of dead links from it.

mdoering commented 1 year ago

There is nothing wrong in using ignore decisions - we use that in the COL checklists all over in the thousands. So we can definitely simply sync the full phylo data.

mdoering commented 1 year ago

for the portal issue see https://github.com/CatalogueOfLife/portal/issues/205

dhobern commented 1 year ago

Does ignore allow the node to be ignored but the children to be included?

mdoering commented 1 year ago

Does ignore allow the node to be ignored but the children to be included?

Yes. ignore just skips that very name while a block decision blocks the entire subtree. You can chose which one is appropriate.

https://github.com/CatalogueOfLife/backend/blob/master/api/src/main/java/life/catalogue/api/model/EditorialDecision.java#L34

yroskov commented 11 months ago

New version of Chordata classification - see emails from Donald & Markus of 2023-11-27.

Dataset id 279229 in CLB: https://www.checklistbank.org/dataset/279229/classification

COL Checklist Chordata Higher Classification, ver 1.0 / 2023-11-27

yroskov commented 11 months ago

Compare Nicolas (Mar 2022) vs Donald (Nov 2023):

Subphyla Cephalochordata & Tunicata = NO CHANGES (classes as lower level)

image

Subphylum Vertebrata - infraphylum Agnatha = EXCLUDE superclass Cyclostomi = DONE 2024-02-05

image

Subphylum Vertebrata - infraphylum Gnathostomata - parvphylum Chondrichthyes = NO CHANGES (classes as lower level)

image

Subphylum Vertebrata - infraphylum Gnathostomata - parvphylum Osteichthyes = EXCLUDE gigaclass Actinopterygii & create class Actinopterygii with FishBase orders as they are in CoL; EXCLUDE gigaclass Sarcopterygii = DONE 2024-02-05

image

Subphylum Vertebrata - infraphylum Gnathostomata - parvphylum Osteichthyes - gigaclass Sarcopterygii - megaclass Tetrapoda = EXCLUDE superclass Amniota = DONE 2024-02-05

image

yroskov commented 11 months ago

Important!

There are only 4 synonyms (already flagged as "ambiguous synonyms"): https://www.checklistbank.org/dataset/279229/names?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&limit=50&offset=0&status=ambiguous%20synonym

Synonym Accepted parent
Reptilia ambiguous synonym of Sphenodontia unranked Tetrapoda > Sphenodontia
Reptilia ambiguous synonym of Squamata unranked Tetrapoda > Squamata
Reptilia ambiguous synonym of Crocodylia unranked Tetrapoda > Crocodylia
Reptilia ambiguous synonym of Testudines unranked Tetrapoda > Testudines

Implementation of 2024-02-05: Orders Rhynchocephalia, Squamata, Crocodylia & Testudines established as new sectors from resource id 279229, synced; ReptileDB children taxa re-established as sectors under orders of resource id 279229.

CONCLUSION:

The main problem, new version (as well as Nicolas version) does not match classification used by ReptileDB: http://www.reptile-database.org/db-info/taxa.html#Sau. If we implement this version, we'll have identical names for reptile classes and orders in the CoL, or we will lose order as a rank in all reptile data (like now). = Awaiting reply from Taxonomy Group

yroskov commented 9 months ago

AS A RESULT of implementing classification from the resource id 279229 "COL Checklist Chordata Higher Classification, ver 1.0 / 2023-11-27", rank Class is missing now for all reptiles.

yroskov commented 9 months ago

@dhobern, CoL management classification adjusted now according to the dataset COL Checklist Chordata Higher Classification, ver 1.0 / 2023-11-27 (id 279229, https://www.checklistbank.org/dataset/279229/classification).

Results are available at the PREVIEW website https://preview.catalogueoflife.org/

dhobern commented 9 months ago

Thanks - looks great.

mdoering commented 8 months ago

Is there no chance we can place the reptile orders into some class?