CatalogueOfLife / data

Repository for COL content
7 stars 2 forks source link

Redundant, accepted higher taxa #463

Open mdoering opened 2 years ago

mdoering commented 2 years ago

There are several "homonyms" existing in the Animal kingdom which should be resolved:

mdoering commented 2 years ago

In addition there are also 2 cases involving protists which might be valid but should be checked:

DaveNicolson commented 2 years ago

At least one of these (the only one I checked so far) has already been addressed in ITIS:

Sphecoid wasp family name Heterogynidae Nagy, 1969 is a junior homonym of lepidopteran family Heterogynidae Rambur, 1866. The stem and family name were emended (to Heterogyna- and Heterogynaidae) by the ICZN in Opinion 1445 (1987)

BUT since COL gets the Apoidea from ITIS, it should already be resolved in COL. In fact, it is resolved in COL, so I wonder if some of the others you list are already resolved too?

https://preview.catalogueoflife.org/data/taxon/624TS

DaveNicolson commented 2 years ago

Leptosomatidae was already resolved in ITIS' birds, and so is already resolved in COL. This comment describes the issue & resolution in ITIS (Leptosomidae is the correct name for the bird family): "The bird family spelling variant 'Leptosomatidae' (TSN 178131), correctly given as Leptosomidae (TSN 553456), was suppressed in Opinion 1068 (1977). The name was originally published by Blyth as subfamily Leptosominae. The suppressed name would have caused homonymy issues with the nematode family Leptosomatidae Filipjev, 1916 (TSN 62709)"

DaveNicolson commented 2 years ago

The mammal (bat) Macroglossinae junior homonym was replaced by Macroglossusinae Almeida, Simmmons and Giannini, 2020 in ITIS, and in COL, so this is also already resolved. Can you re-run your check to filter out the cases involving names that are in synonymy?

DaveNicolson commented 2 years ago

Protist family Sagittariidae Grandori & Grandori, 1935 vs. bird family Sagittariidae [authorship TBD]... I'm not entirely sure who is the proper author of the bird name per ICZN, but it was already in use by 1930. But unless they are both treated under the "zoological" Code, it would not be an issue. The type genera are not identical (-us bird vs. -a protist).

mdoering commented 2 years ago

Oh, sorry. Some of these seem to have slipped through as synonyms! That is fine of course

mdoering commented 2 years ago

The results above were an outcome of some entirely different coding effort. When I analyze just the higher, accepted Animal taxa I get a much larger number of redundant names. Here are the ones that are left if I remove all unique names. The list is huge, but there might be cases where the same name is used at different ranks. Not sure if we should see that above genus level. But the ones with 4 copies or more I checked are real homonyms that should not exist in COL:

mdoering commented 2 years ago

A lot of them are tribes...

mdoering commented 2 years ago

See also https://github.com/CatalogueOfLife/data/issues/464

mdoering commented 1 year ago

@thomasstjerne We have specific tasks to report on dupes for orders, superfamilies, families, genera and subgenera. I would suggest to also add a task for tribes as these seem to slip through with current procedures.

mdoering commented 1 year ago

The duplicate tool shows about 2200 duplicate accepted uninomials in COL within the same nomenclatural code: https://www.checklistbank.org/catalogue/3/duplicates?catalogueKey=3&category=uninomial&codeDifferent=false&limit=100&minSize=2&mode=STRICT&rankDifferent=false&status=accepted

This is clearly wrong. Even the task board shows the problems quickly. Identicial orders, superfamilies and families should not exist in any release:

image

There are >200 higher supergeneric duplicates that should be addressed quickly.

The current procedure for dealing with these duplicates is to create editorial decisions to apply a provisionally accepted status to these, so we end up with just a single fully accepted name. That is not ideal and we should think about ways to fully exclude duplicates. We should consider to follow a similar procedure as with unplaced species which are directly placed under a family, skipping a genus parent. @dhobern @olafbanki could we come up with some written COL checklist guidelines that users know what to expect from the product and editors and sources know how we want common problems to be addressed?

yroskov commented 1 year ago

The duplicate tool shows about 2200 duplicate accepted uninomials in COL within the same nomenclatural code:

I can see 5 groups of duplicates in this report :

1) Names under different Codes. Any ideas on what to do in this case?

Examples:

image image image

2) Names under the same Code, but without authorship (or authorship without a year). Editorial decision requires a lot of additional investigation (need a good team of experts).

Examples:

image image image

3) Names under the same Code, with authorships with a year. This set of names can be resolved easily. I just need a guarantee that decisions on duplicates across GSDs will be preserved and re-applied with each monthly release.

Example:

image

4) Special case of Systema Dipterorum requires editorial resolution of conflicts inside Systema Dipterorum.

Example:

image

5) Cases related to PaleoBioDB, IRMNG & regional ITIS data imported from the AC19 classification. Taxonomy Group help with these data in CoL is very welcome: which redundant taxa need to be removed.

yroskov commented 1 year ago

There are 200 higher supergeneric duplicates that should be addressed quickly.

I was able to resolve only 6 duplicates of 200 from the group 3. Others belong to groups 2, 4 & 5 mentioned above.

Well, applied decisions are not shown in this report.

mdoering commented 1 year ago
  1. Names under different Codes. Any ideas on what to do in this case?

The query was to ignore different codes as these are valid names. I suspect we do not have the code set in all names. Maybe you can update that for those names? If the original data does not yet contain a nomenclatural code this can be specified in a sectors settings. If you give me a list of source datasets per code I could also apply that setting in the database for all its sectors, cause I can see this can get tedious. Please think of setting the code for new sectors.

mdoering commented 1 year ago
  1. Names under the same Code, with authorships with a year. This set of names can be resolved easily. I just need a guarantee that decisions on duplicates across GSDs will be preserved and re-applied with each monthly release.

Ah, I see. I am indeed not sure if we can apply decisions in the project yet. We started that work, but I am not certain if it was ever finished. @thomasstjerne do you remember what the current situation is with decisions created via the project instead of the source view? If it works (the UI needs to retrieve different dataset/project keys) they would end up just as regular decisions in the source and are as stable as others. If I am not mistaken you cannot view decisions in the project view though as the API does not support that.

mdoering commented 1 year ago

@dhobern:

5. Cases related to PaleoBioDB, IRMNG & regional ITIS data imported from the AC19 classification. Taxonomy Group help with these data in CoL is very welcome: which redundant taxa need to be removed.

yroskov commented 1 year ago

Actually, every item in this discussion is for Taxonomy Group - @dhobern

I would also like to see TG assessment of Systema Dipterorum classification/data (and WTaxa, TITAN). We have many internal conflicts there, which we cannot repair on our side with @gdower

yroskov commented 1 year ago

Another question: does ICZN prohibit homonymy (or even regulate names) above family-group? GSDs may ignore any kind of best practice introduced by CoL or GBIF.

mdoering commented 1 year ago

Names above family are not regulated by ICZN. But does that mean COL should not care?

DaveNicolson commented 1 year ago

Actually, names above the family-group (superfamily is the highest) are regulated under the ICZN, but only certain Articles apply to them (e.g., Priority does not apply). Specifically, per Art. 1.2.2, "Articles 1-4, 7-10, 11.1-11.3, 14, 27, 28 and 32.5.2.5 also regulate names of taxa at ranks above the family group." Homonymy is handled in Arts. 52-60, so it does not apply to names above the family-group ranks. Also keep in mind that homonymy in the ICZN is considered only within a given rank-group (within the family-group ranks, within the genus-group ranks, and within the species-group ranks). So no help from that......

DaveNicolson commented 1 year ago

Within the ranks that do deal w/homonymy, there is also this little detail, which is probably not part of the issue here (probably not a common issue): "2.2. Names of taxa at some time but not later classified as animals Any available name of a taxon that has at any time been classified as animal continues to compete in homonymy in zoological nomenclature even though the taxon is later not classified as animal."

yroskov commented 1 year ago

I suspect we do not have the code set in all names. Maybe you can update that for those names? If the original data does not yet contain a nomenclatural code this can be specified in a sectors settings. If you give me a list of source datasets per code I could also apply that setting in the database for all its sectors, cause I can see this can get tedious.

I have checked all GSDs@CoL. Almost all under Geoff and mine management have assigned Code. Few GSDs managed by other people and I cannot check Code assignment. Set of checklists contains taxa under different Codes (marked in my report as "mixed") - how to assign Code to their sectors?

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

GSD | id | Code in CLB | Code as should be -- | -- | -- | -- 3i Auchenorrhyncha | 2317 | zoo |   3i Curculio | 1166 | zoo |   Alucitoidea | 2207 | no access | zoo Animal Biodiversity | 1502 | zoo |   AnnonBase | 1040 | bot |   BdelloideaBase | 1080 | zoo |   Brassicaceae | 2305 | 0 | bot Brentids | 1161 | zoo |   CarabCat | 1146 | zoo |   CCW | 1005 | zoo |   ChiloBase | 1042 | zoo |   CilCat | 1113 | zoo |   Collembola.org | 2130 | zoo |   Conifer Database | 1045 | bot |   COOL | 1052 | zoo |   Droseraceae Database | 1066 | bot |   ELPT | 1074 | bot |   FADA Cladocera | 1138 | zoo |   FADA Ephemeroptera | 1120 | zoo |   FADA Halacaridae | 1139 | zoo |   FADA Nematomorpha | 1119 | zoo |   FADA Rotifera | 1047 | zoo |   FishBase | 1010 | zoo |   FLOW | 1011 | zoo |   Fossil Ginkgoales | 1201 | bot |   Gelechiidae | 2362 | no access | zoo Global Gracillariidae | 1049 | zoo |   Global Lepidoptera Index | 55434 | no access | zoo GloBIS (GART) | 1046 | zoo |   Gymnodinium | 1177 | 0 | [bot] HymIS Crabronidae & Rhopalosomatidae | 1118 | zoo |   HymIS Pompilidae | 2141 | zoo |   ICTV MSL | 1014 | vir |   IRMNG | 2007 | 0 | mixed ITIS | 2144 | 0 | mixed Jewel Beetles | 1190 | zoo |   Lace Bugs Database | 1144 | zoo |   LDL Neuropterida | 1055 | zoo |   MBB | 1076 | zoo |   Microsporidia | 1148 | 0 | [zoo or bot] Mites GSD Ologamasidae | 1063 | zoo |   Mites GSD Phytoseiidae | 1070 | zoo |   Mites GSD Rhodacaridae | 1069 | zoo |   Mites GSD Tenuipalpidae | 1078 | zoo |   MOST | 1019 | bot |   MOWD | 1096 | zoo |   Nepticuloidea | 1172 | zoo |   nomen.eumycetozoa.com | 1053 | bot |   Odonata | 1020 | zoo |   PaleoBioDB | 1174 | 0 | mixed Parhost | 1022 | zoo |   PBI Plant Bug | 1171 | zoo |   Phoronida Database | 1104 | zoo |   Psyllist | 1054 | zoo |   Pterophoroidea | 1199 | zoo |   ReptileDB | 1008 | zoo |   RJB Geranium | 1048 | bot |   ScaleNet | 1026 | zoo |   Scarabs | 1027 | zoo |   Sepidiini tribe | 1206 | zoo |   SF Aphid | 1061 | zoo |   SF Chrysididae | 1169 | zoo |   SF Cockroach | 1051 | zoo |   SF Coleorrhyncha | 1192 | zoo |   SF Coreoidea | 1134 | zoo |   SF Dermaptera | 1158 | zoo |   SF Embioptera | 1089 | zoo |   SF Grylloblattodea | 1170 | zoo |   SF Isoptera | 1198 | zoo |   SF Lygaeoidea | 1173 | zoo |   SF Mantodea | 1062 | zoo |   SF Mantophasmatodea | 1168 | zoo |   SF Orthoptera | 1021 | zoo |   SF Phasmida | 1050 | zoo |   SF Plecoptera | 1065 | zoo |   SF Psocodea | 1133 | zoo |   SF Zoraptera | 1167 | zoo |   Species Fungorum Plus | 2073 | bot |   SpmWeb | 1082 | zoo |   StaphBase | 1204 | zoo |   Systema Dipterorum | 1101 | zoo |   Taxapad Ichneumonoidea | 1068 | zoo |   Tessaratomidae Database | 1143 | zoo |   The Scorpion Files | 1164 | zoo |   The White-Files | 1142 | zoo |   The World List of Cycads | 1163 | bot |   ThripsWiki | 1203 | zoo |   TicksBase | 1030 | zoo |   Tineidae NHM | 1031 | zoo |   TITAN | 1032 | zoo |   Trichomycetes | 1033 | bot |   UCD | 1034 | zoo |   WCO | 2256 | zoo |   WCVP | 2232 | no access | bot WCVP-Fabaceae | 2304 | no access | bot World Ferns | 1140 | bot |   World Plants | 1141 | bot |   WoRMS Actiniaria | 1176 | zoo |   WoRMS Amphipoda | 1202 | zoo |   WoRMS Antipatharia | 1194 | zoo |   WoRMS Appendicularia | 1178 | zoo |   WoRMS Ascidiacea | 1186 | zoo |   WoRMS Asteroidea | 1095 | zoo |   WoRMS Bochusacea | 1086 | zoo |   WoRMS Brachiopoda | 2299 | zoo |   WoRMS Brachypoda | 1087 | zoo |   WoRMS Brachyura | 1108 | zoo |   WoRMS Bryozoa | 1081 | zoo |   WoRMS Cephalochordata | 1154 | zoo |   WoRMS Ceriantharia | 1179 | zoo |   WoRMS Cestoda | 1127 | zoo |   WoRMS Chaetognatha | 1132 | zoo |   WoRMS Copepoda | 1191 | zoo |   WoRMS Corallimorpharia | 1195 | zoo |   WoRMS Crinoidea | 2300 | zoo |   WoRMS Ctenophora | 1180 | zoo |   WoRMS Cubozoa | 1181 | zoo |   WoRMS Cumacea | 1058 | zoo |   WoRMS Echinoidea | 1106 | zoo |   WoRMS Euphausiacea | 2301 | zoo |   WoRMS Foraminifera | 1157 | zoo |   WoRMS Gastrotricha | 1122 | zoo |   WoRMS Gnathostomulida | 1125 | zoo |   WoRMS Holothuroidea | 1107 | zoo |   WoRMS Hydrozoa | 1112 | zoo |   WoRMS Isopoda | 1094 | zoo |   WoRMS Kinorhyncha | 1153 | zoo |   WoRMS Leptostraca | 1105 | zoo |   WoRMS Loricifera | 1182 | zoo |   WoRMS Merostomata | 1152 | zoo |   WoRMS Mollusca | 1130 | zoo |   WoRMS Monogenea | 1126 | zoo |   WoRMS Myriapoda | 1200 | zoo |   WoRMS Mystacocarida | 1088 | zoo |   WoRMS Myxozoa | 1129 | zoo |   WoRMS Nematoda | 2302 | zoo |   WoRMS Nemertea | 1085 | zoo |   WoRMS Octocorallia | 1131 | zoo |   WoRMS Oligochaeta | 1099 | zoo |   WoRMS Ophiuroidea | 1059 | zoo |   WoRMS Orthonectida | 1149 | zoo |   WoRMS Ostracoda | 1175 | zoo |   WoRMS Placozoa | 1123 | zoo |   WoRMS Polychaeta | 1090 | zoo |   WoRMS Polycystina | 1109 | zoo |   WoRMS Porifera | 1044 | zoo |   WoRMS Priapulida | 1124 | zoo |   WoRMS Pycnogonida | 1183 | zoo |   WoRMS Remipedia | 1091 | zoo |   WoRMS Rhombozoa | 1150 | zoo |   WoRMS Scleractinia | 1196 | zoo |   WoRMS Scyphozoa | 1188 | zoo |   WoRMS Staurozoa | 1184 | zoo |   WoRMS Strepsiptera | 1103 | zoo |   WoRMS Tanaidacea | 1110 | zoo |   WoRMS Tantulocarida | 1092 | zoo |   WoRMS Thaliacea | 1185 | zoo |   WoRMS Thermosbaenacea | 1093 | zoo |   WoRMS Trematoda | 1128 | zoo |   WoRMS Turbellarians | 1193 | zoo |   WoRMS Xenoturbellida | 1100 | zoo |   WoRMS Zoantharia | 1197 | zoo |   WSC | 1029 | zoo |   WTaxa | 1039 | zoo |   ZOBODAT Vespoidea | 1037 | zoo |  

mdoering commented 1 year ago

I have updated:

These settings only take effect when we (re)import the datasets.

yroskov commented 1 year ago

Thanks!

@mdoering, GO (@gdower) works on Codes for ITIS sectors. Our question: do we need to do re-import/re-sync after Code assignment for the sector?

mdoering commented 1 year ago

For mixed datasets we should set the code in the sector settings. These get applied during syncs, not imports. I can only find changing sector settings in the assembly tree now:

image

After changing the ITIS beetle sector to zoological it already shows in the sector context menu:

image

@thomasstjerne did we not offer sector editing straight in the sector table view?

gdower commented 1 year ago

I'm working on adding the nom code to ITIS sectors. I think I had done it previously but some of the new sectors don't have it. Having the nom code display and be editable in the sectors table would be great.

mdoering commented 1 year ago

The settings in both the dataset and sectors are only applied at import / sync time, so if they change this needs a reimport or resync respectively. But I guess we can wait for many datasets/sectors for the next regular update. I would just force an update for the rarely updated ones now.

gdower commented 1 year ago

Can you tell if the AC19 datasets have code set? I think it's likely we did already re-import and re-sync them after setting the code.

yroskov commented 1 year ago

Examples of "AC19 databases" (i.e. checklists which were imported in CLB from AC19 and were not updated since that time):

Conifer Database Taxapad Ichneumonoidea The White-Files TicksBase Tineidae NHM TITAN UCD

mdoering commented 1 year ago

I can query for all names in the project that do not have a code set and group them by their source dataset key. There are surprisingly many with small numbers, often 1, which feels like a code problem.

Then there are 3122 without any dataset/sector. I suspect these are the management hierarchy or relicts of sector deletions. Its gonna be harder to get them right, but especially the higher management classification is important to have with a code so we can query for dupes properly.

col=> select s.subject_dataset_key, count(*) from name_3 n left join sector s on s.id=n.sector_key and s.dataset_key=n.dataset_key where n.code is null group by 1 order by 1;
 subject_dataset_key | count  
---------------------+--------
                1010 |   5758
                1011 |   2697
                1019 |   1062
                1020 |    690
                1022 |    270
                1026 |   1066
                1029 |   4390
                1030 |     21
                1031 |    342
                1032 |   4962
                1033 |     28
                1034 |   2066
                1037 |    324
                1040 |    123
                1042 |    430
                1044 |      2
                1045 |     80
                1046 |    118
                1047 |    177
                1048 |      1
                1052 |    367
                1053 |    135
                1054 |    286
                1055 |   1280
                1058 |      1
                1059 |      1
                1063 |     46
                1066 |      4
                1068 |   2597
                1069 |     16
                1070 |     86
                1074 |    564
                1076 |     18
                1078 |     34
                1080 |     40
                1081 |     12
                1082 |     76
                1085 |      2
                1086 |      1
                1087 |      1
                1088 |      1
                1090 |     18
                1091 |      1
                1092 |      1
                1093 |      1
                1094 |      3
                1095 |      1
                1096 |    405
                1099 |      3
                1101 |  15053
                1103 |      1
                1104 |      4
                1105 |      1
                1106 |     39
                1107 |      1
                1109 |    217
                1110 |      2
                1112 |      2
                1113 |   1262
                1118 |      9
                1119 |     23
                1120 |    491
                1122 |      1
                1123 |      1
                1124 |      1
                1125 |      1
                1126 |      2
                1127 |      3
                1128 |     13
                1129 |      5
                1130 |     85
                1131 |      2
                1132 |      1
                1138 |    134
                1139 |     18
                1142 |    160
                1143 |     61
                1144 |    285
                1146 |      1
                1148 |   1838
                1149 |      5
                1150 |      4
                1152 |      1
                1153 |      1
                1154 |      1
                1157 |     72
                1161 |    291
                1171 |   1531
                1172 |     33
                1174 |   4358
                1175 |     20
                1176 |      4
                1177 |    410
                1178 |      1
                1179 |      1
                1180 |      1
                1181 |      1
                1182 |      1
                1183 |      1
                1184 |      1
                1185 |      1
                1186 |      1
                1188 |      2
                1190 |    562
                1191 |     14
                1193 |     19
                1194 |      1
                1195 |      1
                1196 |      7
                1197 |      1
                1200 |     45
                1201 |     21
                1202 |      3
                2007 |   1078
                2073 |  13258
                2141 |    284
                2232 | 418913
                2256 |     80
                2299 |      1
                2300 |      2
                2301 |      1
                2302 |    269
                2304 |  82563
               55434 |    396
                     |   3122
yroskov commented 1 year ago

3122 taxa may remain as classification data from ITIS Regional in AC19 - my guess

dhobern commented 1 year ago

For what it's worth, Tineidae is next on my list of datasets to try to improve after the Geometridae and Pyralidae/Crambidae.

--

Donald Hobern / @. / +61 420511471 Araba Bioscan Project https://stangeia.hobern.net/araba-bioscan-project/ / Pterophoroidea https://pterophoroidea.hobern.net/ / Alucitoidea https://alucitoidea.hobern.net/ / BOLD Australia https://bold-au.hobern.net/ ORCID: 0000-0001-6492-4016 https://orcid.org/0000-0001-6492-4016 / Blog https://stangeia.hobern.net/ / iNaturalist https://inaturalist.ala.org.au/people/dhobern / Flickr https://www.flickr.com/photos/dhobern// GitHub https://github.com/dhobern / Twitter https://twitter.com/dhobern / Mastodon @.>

On Fri, 18 Nov 2022 at 09:28, yroskov @.***> wrote:

Examples of "AC19 databases" (i.e. checklists which were imported in CLB from AC19 and were not updated since that time):

Conifer Database Taxapad Ichneumonoidea The White-Files TicksBase Tineidae NHM TITAN UCD

— Reply to this email directly, view it on GitHub https://github.com/CatalogueOfLife/data/issues/463#issuecomment-1319290525, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGHP4ZVU2K2BTBCIA3S4LFTWI2WSRANCNFSM6AAAAAAQMJLWEA . You are receiving this because you were mentioned.Message ID: @.***>

gdower commented 1 year ago

@mdoering, it's odd that 2144 isn't in your list because at least a few ITIS sectors didn't have nom code set, unless you beat me to setting nom code and syncing them before running that query? All ITIS sectors should have nom code set now and I'm syncing them. There might have only been 2 ITIS sectors without nom code set.

IRMNG doesn't have nom code set for most sectors, but it's unclear what it should be for some names:

botanical: Pinguiophyceae botanical: Xanthophyceae botanical: Schizocladiophyceae botanical: Raphidophyceae botanical: Picophagophyceae botanical: Phaeophyceae botanical: Phaeothamniophyceae botanical: Bolidophyceae botanical: Eustigmatophyceae botanical: Chrysomerophyceae botanical: Dictyochophyceae botanical: Chrysophyceae Discosea Paramastigaceae Stephanomonadaceae Nephridiophagidae Sulcozoa Acavomonidia Haptophyta Cryptista Apicomonadea Coccidiomorphea Colponemea Syndinea Distomatopyxidae Cryptodifflugiidae Microcoryciidae Plagiopyxidae Rhodophyta Glaucophyta Chlorophyta Charophyta Schizopyrenida Tubulinea Variosea Calcitarcha Choanoflagellatea Filasterea Cristidiscoidea Rozellidea Euglenozoa Loukozoa Metamonada Brasilobiaceae Jolyaceae Ellobiopsea Piroplasmida Noctilucea Archigregarinida Againococcidiida Eugregarinida Eucoccidiida Oxyrrhea Perkinsea Pellitidae Microchlamyiidae Lesquereusiidae Lamtopyxidae Hyalospheniidae Heleoperidae Acephalidae Neogregarinida Protococcidiida Endohelea Picozoa Trichosidae

gdower commented 1 year ago

@mdoering, I noticed that there's a mix of implementations in ITIS sectors for where the code is defined. Does it matter if it is nested inside of the subject or are you taking code from either?

{
  "subject": {
      "code": "zoological",
      ...
  },
}
{
  "subject": {
      ...
  },
  "code": "zoological"
}
{
  "subject": {
     "code": "zoological",
      ...
  },
  "code": "zoological"
}
yroskov commented 1 year ago

These IRMING sectors should be botanical for sure: Haptophyta, Rhodophyta, Glaucophyta, Chlorophyta, Charophyta

mdoering commented 1 year ago

@mdoering, I noticed that there's a mix of implementations in ITIS sectors for where the code is defined. Does it matter if it is nested inside of the subject or are you taking code from either?

These have different purposes. The root code property is the one that defines which code to apply as the default for all names that are synced - if it not already has one.

The other 2 are part of the subject or target definition and help to find the name during a rematch.

So only the root one is what we are after here.

thomasstjerne commented 1 year ago

@thomasstjerne did we not offer sector editing straight in the sector table view? Yes, but it is tied to the new tabbed sector view with priority management and sync. So it will only be available when the xcol branch is merged in

yroskov commented 1 year ago

Just in case. Now, we are sure that Code assigned with all GSDs (except IRMNG). However, there are still cases like WCVP-Fabaceae vs 3i Auchenorrhyncha, WoRMS Brachyura vs WCVP, WoRMS Trematoda vs Species Fungorum, etc. in the report: (setting "Code different: No")

https://www.checklistbank.org/catalogue/3/duplicates?catalogueKey=3&category=uninomial&codeDifferent=false&limit=999&minSize=2&mode=STRICT&rankDifferent=false&status=accepted&withDecision=false

yroskov commented 1 year ago

Experiment with the duplications tool in the project: I did resolve three genera from my group 3: Abantiades, Abaris, Absonus (most recent marked as Prov Acc). I do not see any difference, these names are still in the report, their status was not changed in the project.

mdoering commented 1 year ago

Experiment with the duplications tool in the project: I did resolve three genera from my group 3: Abantiades, Abaris, Absonus (most recent marked as Prov Acc). I do not see any difference, these names are still in the report, their status was not changed in the project.

did you sync the respective sectors?

mdoering commented 1 year ago

Just in case. Now, we are sure that Code assigned with all GSDs (except IRMNG). However, there are still cases like WCVP-Fabaceae vs 3i Auchenorrhyncha, WoRMS Brachyura vs WCVP, WoRMS Trematoda vs Species Fungorum, etc. in the report: (setting "Code different: No")

https://www.checklistbank.org/catalogue/3/duplicates?catalogueKey=3&category=uninomial&codeDifferent=false&limit=999&minSize=2&mode=STRICT&rankDifferent=false&status=accepted&withDecision=false

Again, did you sync (code set in sector settings) or import & sync (code set in dataset options)? Looking at the 2 Abrus names the WCVP one is still missing a code: https://www.checklistbank.org/catalogue/3/name/9ea30e1c-583f-4eeb-b4d5-6ca79906b7b2 https://www.checklistbank.org/catalogue/3/name/f7c23d08-02de-4f3a-9116-6134007290ae

yroskov commented 1 year ago

Woops, no. Do it now

Synced 2022-11-18: WTaxa Abantiades Broun, T., 1914, Abaris Voss, E., 1958 & WoRMS genus Absonus Rubio & Rolán, 2021 - to check whether "prov acc" status via TASKS-project works or not = @mdoering, it looks like TASKS-project tool does not work. I have applied "provisionally accepted statusess to three genera above, but their status was not changes after syns and PREVIEW release: https://github.com/CatalogueOfLife/testing/issues/211#issuecomment-1320606882

Impossible to resolve duplicated unimonial taxa at this moment.

GSDs with assigned Code synced/re-synced as well: ITIS Gelechiidae Brassicaceae WCVP WCVP Fabaceae Global Lepidoptera Index

No changes in the report on duplicated uninomial taxa. Duplicates inside the same Code remain in the report despite of setting "Code different: No".

yroskov commented 1 year ago

For attention of @mdoering:

Just in case. Now, we are sure that Code assigned with all GSDs (except IRMNG). However, there are still cases like WCVP-Fabaceae vs 3i Auchenorrhyncha, WoRMS Brachyura vs WCVP, WoRMS Trematoda vs Species Fungorum, etc. in the report: (setting "Code different: No")

I have synced/re-synced all related GSDs on 2022-11-18. However, cases inside one Code like WCVP-Fabaceae vs 3i Auchenorrhyncha, WoRMS Brachyura vs WCVP, WoRMS Trematoda vs Species Fungorum, etc. are in the report with the setting "Code different: No": https://www.checklistbank.org/catalogue/3/duplicates?catalogueKey=3&category=uninomial&codeDifferent=false&limit=100&minSize=2&mode=STRICT&offset=0&rankDifferent=false&status=accepted

mdoering commented 1 year ago

@yroskov I can try to update all those broken decisions that were created via the project dupe tool in the past week using database sql calls: https://www.checklistbank.org/catalogue/3/decision?broken=true&limit=100&offset=0&subjectDatasetKey=1039

yroskov commented 1 year ago

@mdoering, could you pls have a look on Slack message. we have urgent and severe problem in plant data due to Friday's WCVP sync https://app.slack.com/client/T7PM2N197/C7PH3N5PS

mdoering commented 1 year ago

There are still 215 redundant families and above in the COL checklist, 189 (sub)tribes and 207 genera with the same authorship.

mdoering commented 1 year ago

See also https://github.com/CatalogueOfLife/data/issues/464

That was resolved up to the level of my ability.

yroskov commented 1 year ago

There are still 215 redundant families and above in the COL checklist

Thank you, Markus! I'll look through the list. I feel, there are different categories of issues in this list.

Category 1: homonyms across GSDs = FIXED, junior flagged as provisionally accepted. Woops! decision disappeared after page been refreshed in a browser. Seems, name status changed via decision in GSD does not translated in the project (see https://github.com/CatalogueOfLife/backend/issues/1224)

Example: image

Category 2: homonyms inside GSD = supposed to be fixed through junior flagged as provisionally accepted, however, I got an Error (see below) image

Seems, name status changed via decision in GSD does not translated in the project.

Category 3: duplicates inside GSD with identical placement in the classification 3A: same name, same authorship 3B: same name, but one with empty authorship I have checked few names in GSDs@CLB from this category, and I cannot understand how identical children may appear in CLB? Would it be a problem of importer or CoLDP?

Examples: image image

image

Category 4: duplicates inside GSD with different placements in the classification 4A: unresolved classification in GSD 4B: incertae sedis cases (similar to incertae sedis genera)

Category 5: duplicates between GSD and CoL management classification. Taxa in CoL management classification have no children and can be deleted. Unfortunately, it cannot be done in this list/tool: https://www.checklistbank.org/catalogue/3/duplicates?catalogueKey=3&category=uninomial&limit=100&minSize=2&mode=STRICT&offset=0&rank=family&rank=class&rank=order&rank=phylum&rank=suborder&rank=infraorder&rank=superfamily&rank=subfamily&rank=suprageneric%20name&rank=superorder&rank=subclass&rank=superclass&rank=subphylum&rankDifferent=false&status=accepted&withDecision=false = FIXED 2023-05-09, obsolete taxa deleted in CoL management classification in the Assembly tool

Examples: image image

image

yroskov commented 1 year ago

@mdoering, as I can see, neither of actions (decisions) can be proceeded through this list/tool: https://www.checklistbank.org/catalogue/3/duplicates?catalogueKey=3&category=uninomial&limit=100&minSize=2&mode=STRICT&offset=0&rank=family&rank=class&rank=order&rank=phylum&rank=suborder&rank=infraorder&rank=superfamily&rank=subfamily&rank=suprageneric%20name&rank=superorder&rank=subclass&rank=superclass&rank=subphylum&rankDifferent=false&status=accepted&withDecision=false

Is it correct?

mdoering commented 1 year ago

Yes, as we discussed in yesterdays call project decisions are a pending issue.

New or modified decisions will only impact project data when it is resynced, so the behavior is expected.

As for applying decisions on any taxa quickly I would suggest to be able to do this from a taxon/synonym details page and link to taxon/synonym pages from everywhere. @thomasstjerne does that make sense?