CatalogueOfLife / testing

Editorial tests and discussion to prepare for COL releases
2 stars 0 forks source link

Classification conflicts inside Lepidoptera #224

Open yroskov opened 1 year ago

yroskov commented 1 year ago

@dhobern, I have a look through the report on duplicated genera in the CoL (2023-04-19):

https://www.checklistbank.org/dataset/3/duplicates?category=uninomial&codeDifferent=false&limit=980&minSize=2&mode=STRICT&offset=0&rank=genus&status=accepted

It looks like there are classification conflicts between dataproviders. At least, I have spotted following conflicts: (Classification in the examples below is incomplete. It is here as appear in the report)

(A) There is a significant set of identical genera in families Meessiidae at GLI vs Tineidae at TineidaeNHM Lepidoptera> Tineoidea> Tineidae vs Lepidoptera> Tineoidea> Meessiidae

(B) genera Brachydoxa, Dryadaula, Metasticha in GLI vs TineidaeNHM Lepidoptera> Tineoidea> Dryadaulidae Lepidoptera> Tineoidea> Tineidae and Lepidoptera> Tineoidea> Psychidae Lepidoptera> Tineoidea> Tineidae

(C) genus Cryptologa in GLI vs Global Gracillariidae Lepidoptera> Gracillarioidea> Douglasiidae Gracillariidae> Gracillariinae> Gracillariini

(D) genus Phyllocnistis in Gelechiidae vs Global Gracillariidae Lepidoptera> Gelechioidea> Gelechiidae Gracillariidae> Phyllocnistinae> Phyllocnistini

(E) There is a set of identical genera in GLI vs Gelechiidae:

genus Anacampsis Lepidoptera> Gelechioidea> Gelechiidae Gelechioidea> Gelechiidae> Anacampsina

genus Anarsia Gelechioidea> Gelechiidae> Dichomeridinae Lepidoptera> Gelechioidea> Gelechiidae

genus Barea Gelechioidea> Oecophoridae> Oecophorinae Lepidoptera> Gelechioidea> Gelechiidae

genus Borkhausenia Gelechioidea> Oecophoridae> Oecophorinae Lepidoptera> Gelechioidea> Gelechiidae

genus Copidostola Lepidoptera> Gelechioidea> Gelechiidae Gelechioidea> Oecophoridae> Oecophorinae

genus Dichomeris Gelechioidea> Gelechiidae> Dichomeridinae Lepidoptera> Gelechioidea> Gelechiidae

genus Gelechia Lepidoptera> Gelechioidea> Gelechiidae Gelechiidae> Gelechiinae> Gelechiini

genus Gymnobathra Gelechioidea> Oecophoridae> Oecophorinae Lepidoptera> Gelechioidea> Gelechiidae

genus Hierodoris Lepidoptera> Gelechioidea> Gelechiidae Gelechioidea> Oecophoridae> Oecophorinae

genus Macrobathra Gelechioidea> Cosmopterigidae> Cosmopteriginae Lepidoptera> Gelechioidea> Gelechiidae

genus Nothris Gelechioidea> Gelechiidae> Anacampsinae Lepidoptera> Gelechioidea> Gelechiidae

genus Phthorimaea Lepidoptera> Gelechioidea> Gelechiidae Gelechiidae> Gelechiinae> Gnorimoschemini

genus Psoricoptera Lepidoptera> Gelechioidea> Gelechiidae Gelechioidea> Gelechiidae> Gelechiinae

genus Scieropepla Lepidoptera> Gelechioidea> Gelechiidae Lepidoptera> Gelechioidea> Xyloryctidae

genus Symmoca Lepidoptera> Gelechioidea> Gelechiidae Gelechioidea> Autostichidae> Symmocinae

genus Trachyntis Lepidoptera> Gelechioidea> Gelechiidae Gelechioidea> Oecophoridae> Oecophorinae

genus Xystophora Lepidoptera> Gelechioidea> Gelechiidae Gelechioidea> Gelechiidae> Anomologinae

yroskov commented 1 year ago

It would be nice to eliminate these conflicts in AC23.

I can fix cases B, C & E through blocking appropriate genera in GLI (well, we cannot guarantee that such decisions will stay and be consistently re-applied in CLB from one update to another).

It would be nice to resolve case A (Meessiidae vs Tineidae) in the source datasets.

In the case D (Phyllocnistis in Gelechiidae vs Global Gracillariidae), I need your advice what is a preferred placement. I can block the genus in CLB.

dhobern commented 1 year ago

Thanks, Yury.

Quick comments on each.

A), B) GLI is correct here - these are no longer treated as Tineidae. I am slowly working on a cleaned version of the NHM Tineidae dataset. In the meantime, please block these genera from the NHM dataset if possible. I will address more effectively as soon as I can.

C) Accept the Global Gracillariidae for this. I will move it inside GLI to avoid the collision.

D) GLI doesn't include a genus Phyllocnistis, just a species with no current name that was described in that genus. The genus should remain in Global Gracillariidae. I'll check whether I can mark the incertae sedis nature of the Gelechiidae species better.

E) I'd just found one or two of these and will fix them up.

dhobern commented 1 year ago

Looking at other examples, do you have a preferred way for us to mark binomials as placeholders for incertae sedis species like the "Phyllocnistis" species. In TaxonWorks, In GLI, I mark such cases as incertae sedis and TW then places square brackets around the genus name. As far as I can see, no other information on the status of these names comes through from TW to CLB.

Would square brackets around the genus be sufficient to make it work on your side? Would you like any other changes?

dhobern commented 1 year ago

Sorry @yroskov I don't understand some of your (E) - why does it matter that these genera appear in different places within the family Gelechiidae if the Gelechiidae dataset replaces the family in GLI?

yroskov commented 1 year ago

Sorry @yroskov I don't understand some of your (E) - why does it matter that these genera appear in different places within the family Gelechiidae if the Gelechiidae dataset replaces the family in GLI?

Woops! Sorry, internal Gelechiidae duplicates should be excluded from the case E.

yroskov commented 1 year ago

@dhobern, I included A, B & C in the plan for May edition. Unfortunately, I cannot guarantee that blocking decisions will work with further updates.

dhobern commented 1 year ago

Thanks - I'll work on fixing Tineidae as soon as I can.

--

Donald Hobern / @. / +61 420511471 Araba Bioscan Project https://stangeia.hobern.net/araba-bioscan-project/ / Pterophoroidea https://pterophoroidea.hobern.net/ / Alucitoidea https://alucitoidea.hobern.net/ / BOLD Australia https://bold-au.hobern.net/ ORCID: 0000-0001-6492-4016 https://orcid.org/0000-0001-6492-4016 / Blog https://stangeia.hobern.net/ / iNaturalist https://inaturalist.ala.org.au/people/dhobern / Flickr https://www.flickr.com/photos/dhobern// GitHub https://github.com/dhobern / Mastodon @.>

On Thu, 20 Apr 2023 at 23:37, yroskov @.***> wrote:

@dhobern https://github.com/dhobern, I included A, B & C in the plan for May edition. Unfortunately, I cannot guarantee that blocking decisions will work with further updates.

— Reply to this email directly, view it on GitHub https://github.com/CatalogueOfLife/testing/issues/224#issuecomment-1516345760, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGHP4ZS2YAOTCGTQ2VXXD63XCE327ANCNFSM6AAAAAAXEQVH54 . You are receiving this because you were mentioned.Message ID: @.***>

dhobern commented 1 year ago

Actually, those cases in (E) are mostly the same as your case (D) - these are incertae sedis species with binomials that do not reflect an acceptable genus. Should I enclose the genera in these cases with square brackets or do you have a preferred way to represent such cases? TaxonWorks uses the square brackets for such examples.

yroskov commented 1 year ago

Looking at other examples, do you have a preferred way for us to mark binomials as placeholders for incertae sedis species like the "Phyllocnistis" species. In TaxonWorks, In GLI, I mark such cases as incertae sedis and TW then places square brackets around the genus name. As far as I can see, no other information on the status of these names comes through from TW to CLB.

Would square brackets around the genus be sufficient to make it work on your side? Would you like any other changes?

As for me, square brackets with the genus for incertae sedis species is a most natural way to present temporary placement in the checklist. It is often case in zoological GSDs with unresolved taxonomy (whereas botanical GSDs create provisional combinations "nomen inedita" (flagged as "provisionally accepted" in the CoL).

I would like to see the same presentation in the CoL, i.e. genera in square brackets in species names and in the classification with CoL status "provisionally accepted name". Unfortunately, CLB does not allow to do this (here is a report on our experiments with bracketed genera in Systema Dipterorum https://github.com/CatalogueOfLife/testing/issues/127#issuecomment-1291120920.

@mdoering, can we, please, come back to this and alter CLB for proper handling genera in square brackets?

yroskov commented 1 year ago

Actually, those cases in (E) are mostly the same as your case (D) - these are incertae sedis species with binomials that do not reflect an acceptable genus. Should I enclose the genera in these cases with square brackets or do you have a preferred way to represent such cases? TaxonWorks uses the square brackets for such examples.

I would chose square brackets for these cases. At lest, it removes user confusion with split genera in the Tree (in both cases, if CLB blocking them from CoL, or if CLB may include them in CoL with adequate presentation).

mdoering commented 1 year ago

We had this many times before. If we really want a rendering of square brackets we need to have a new flag for the parsed name that indicates that. Probably even on the usage, not the name. This does make things a lot more complicated and I am not convinced this is a universal practise.

In any case this is nothing we can do in a day, so even if we want this it will have to wait until after summer when we have the first extended checklist.

For now let's please stick to the ColDP / CLB convention as we know it: https://github.com/CatalogueOfLife/coldp/blob/master/README.md#species-with-an-uncertain-genus

mdoering commented 1 year ago

Basically the name is a name, not a vehicle for all kinds of extra information.

yroskov commented 1 year ago

For now let's please stick to the ColDP / CLB convention as we know it: https://github.com/CatalogueOfLife/coldp/blob/master/README.md#species-with-an-uncertain-genus :

COL strongly recommends to flag the species taxon with provisional=true

It is exactly what I cannot succeed in CLB.

@mdoering, are GSDs able to flag these names as "provisional=true" in their CoLDP export?

I got answer from Geoff. He will open a ticket for TW exporter.

mdoering commented 1 year ago

COL strongly recommends to flag the species taxon with provisional=true

It is exactly what I cannot succeed in CLB.

@mdoering, are GSDs able to flag these names as "provisional=true" in their CoLDP export?

Of course they can. Either via Taxon.provisional or NameUsage.status=provisionally accepted depending on how they share their data.

dhobern commented 1 year ago

Great - I have the provisional status in my other datasets but forgot to set it for these. They should be good now. So you can reimport Gelechiidae whenever suits you.

yroskov commented 1 year ago

Done. Gelechiidae synced for May edition.

yroskov commented 1 year ago

List of Meessiidae genera in GLI (35):

Genus Blocked in Tineidae NHM
Afrocelestis 2023-04-21
Agnathosia 2023-04-21
Agoraula 2023-04-21
Augolychna 2023-04-21
Bathroxena 2023-04-21
Clinograptis 2023-04-21
Diachorisia 2023-04-21
Doleromorpha 2023-04-21
Emblematodes 2023-04-21
Epactris 2023-04-21
Eudarcia 2023-04-21
Galachrysis 2023-04-21
Homosetia 2023-04-21
Homostinea 2023-04-21
Hybroma 2023-04-21
Infurcitinea 2023-04-21
Ischnoscia 2023-04-21
Isocorypha 2023-04-21
Leucomele 2023-04-21
Lichenotinea 2023-04-21
Matratinea 2023-04-21
Mea 2023-04-21
Meneessia 2023-04-21
Montetinea 2023-04-21
Nannotinea 2023-04-21
Novotinea 2023-04-21
Oenoe 2023-04-21
Omichlospora Not present in Tineidae NHM as accepted genus; species Infurcitinea incertula blocked.
Oxylychna 2023-04-21
Pompostolella 2023-04-21
Stenoptinea 2023-04-21
Tenaga 2023-04-21
Tineiforma Not present in Tineidae NHM as accepted genus; species Infurcitinea sardica blocked.
Trissochyta 2023-04-21
Xeringinia 2023-04-21
yroskov commented 1 year ago

TineidaeNHM re-synced 2023-04-26

yroskov commented 1 year ago

Global Lepidoptera Index re-synced 2023-04-26

yroskov commented 1 year ago

As present 2023-04-26 in Gelechiidae 1.1.23.115 (25 Apr 2023): image

Gelechiidae re-synced 2023-04-26

Checks of COL23.4, 2023-04-26, id 9889: (D) genus Phyllocnistis in Gelechiidae vs Global Gracillariidae = NOT FIXED image

image

image

Gelechiidae 1.1.23.117 (27 Apr 2023) / 2023-04-27 re-synced 2023-04-27

Results: NOT FIXED, genus Phyllocnistis (with a single species) flagged in Gelechiidae GSD as "accepted"; species name Phyllocnistis spilota is also flagged as "accepted" - for attention of @mdoering & @dhobern https://www.checklistbank.org/catalogue/3/names?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&limit=50&offset=0&q=Phyllocnistis%20spilota&sortBy=taxonomic

image

image

yroskov commented 1 year ago

Gelechiidae re-synced 2023-04-26

Checks 2023-04-27:

image

image

image

dhobern commented 1 year ago

Aaargh - sorry. I somehow reverted the provisional flags on the unplaced species, including [Phyllocnistis] spilota - these are fixed again for next time you import.

yroskov commented 1 year ago

Either via Taxon.provisional or NameUsage.status=provisionally accepted depending on how they share their data.

@mdoering, seems, "provisional" flag does not work with "unplaced genera": genus Phyllocnistis (with a single species name Phyllocnistis spilota) still appear in Gelechiidae GSD as "accepted" (it happens after sync of fixed Gelechiidae ver. 1.1.23.117 (27 Apr 2023) / 2023-04-27) - see my report two blocks above.