Open yroskov opened 1 year ago
TITAN DEV3 https://www.dev.checklistbank.org/dataset/55400/classification
Imported 2023-06-30:
(TITAN DEV3)
...there are no type subspecies in CLB checklist version due to presence of "=" on subspecies page (such pages were excluded, because they create incorrect species placements in genera):
SUGGESTION: ignore portions in brackets during conversion (i.e. no subtribes)
(Example: tribe Cerambycina & tribe: Cerambycinae)
In TITAN, the list of tribes:
In CLB:
Other cases:
In TITAN:
in CLB:
In TITAN:
in CLB:
In TITAN:
in CLB:
In TITAN:
tribe nomina nuda with 7 genera = species not present in CLB tribe Fossiles with 33 genres - it present in TITAN = not in CLB. In theory, these 33 genera (http://titan.gbif.fr/sel_tribu.php) should be attached directly to the family Cerambycidae, but we may get split genera. Thus, species should be attached directly to the family Cerambycidae (i.e. without parent genus) with "Accepted" status.
Example: Aegomorphus pereirai (Prosen & Lane, 1955)
In CLB: https://www.dev.checklistbank.org/dataset/55400/taxon/121
TITAN DEV4 https://www.dev.checklistbank.org/dataset/56387/classification
Imported 2023-07-11, 1:39 PM
TITAN DEV4 https://www.dev.checklistbank.org/dataset/56387/classification - new iteration
Imported 2023-07-11, 5:37 PM
TITAN DEV4 https://www.dev.checklistbank.org/dataset/56387/classification - next iteration
Imported 2023-07-12, 4:04 PM
[x] Imported: 36,216 (vs 36,216) spp
[x] Three Incertae sedis tribes are not resolved yet: tribes Incertae sedis and their children subtribes "Cerambycinae", "Lamiinae" & "Prioninae" need to be killed as intermediate ranks. Their genera should be attached directly to their parent subfamilies. = see fix in DEV2, not fully fixed; in DEV4
TITAN DEV4 https://www.dev.checklistbank.org/dataset/56387/classification - new iteration
Imported 2023-07-12, 8:29 PM
[x] Imported: 36,216 (vs 36,216) spp
[x] Three Incertae sedis tribes: genera moved to a next parent (Fix!); 3 empty Incertae sedis tribes should be blocked in Assembly (YR!)
NEXT STEPS IN PRODUCTION:
TITAN DEV4 https://www.dev.checklistbank.org/dataset/56387/classification - another iteration
Imported 2023-07-12, 11:31 PM
NEXT STEPS IN PRODUCTION:
Copy of ACCESS database received 2023-07-25.
TITAN of Jun 2023 / 2023-06-26 imported to the production CLB 2023-07-13
ISSUES assessed 2023-08-08
Multi Word Monomial, 520: 520 accepted subgenera recorded in combination with genus name. https://www.checklistbank.org/catalogue/3/dataset/1032/workbench?facet=rank&facet=issue&facet=status&facet=nomStatus&facet=nameType&facet=field&facet=authorship&facet=authorshipYear&facet=extinct&facet=environment&facet=origin&issue=wrong%20monomial%20case&limit=800&offset=0 Examples: Abatocera (Sternobatocera) Breuning, 1943; Acalolepta (Pilohammus) Vitali, 2019; Bacchisa (Cyanastus) Pascoe, 1867 @gdower, it's not clear to me whether CLB able to handle subgenera in such form or not.
names (all synonyms) with words "subgen.", "morpha", "subsp.", "sp." (3,246 names with "sp") blocked. These names should be blocked BEFORE resolving Tasks
TASKS
Resolved 2023-08-08:
Sync started 2023-08-08
Sync failed:
@olafbanki, sync of new TITAN data failed due to "OTHER" license. Indeed, metadata says "License: Other" I didn't find TITAN in the log of license negotiations https://github.com/CatalogueOfLife/testing/issues/30 Could you please confirm the license for TITAN? (Data are ready for August edition, but sync is blocked by the CLB due to license mismatch).
@olafbanki, 2023-08-15: With the removal, I assume also the metadata changed; as I explicitly changed the licensing of TITAN for the Annual Checklist 2023. TITAN should be CC-BY. This agreement was reached with Thierry Bourgoin, after we and him did not manage to get in contact with Gerard (contact not established since 2021).
Synced 2023-08-17
TITAN of Jun 2023 / 2023-06-26 imported to the DEV 2023-08-23 8:48 PM as titan dev2 after adjustments in the crawler script
Tested 2023-08-24 at https://www.dev.checklistbank.org/dataset/55378/classification
It was a month ago (view in the portal):
It is now (view in the DEV CLB):
[ ] to bring back authorstrings in combinations in synonymy = @gdower, I'm not sure if the issue is fixed in this iteration: https://www.dev.checklistbank.org/dataset/55378/taxon/1287 https://www.dev.checklistbank.org/dataset/55378/taxon/22127 https://www.dev.checklistbank.org/dataset/55378/taxon/28277 https://www.dev.checklistbank.org/dataset/55378/taxon/13725
[x] to bring distribution data = FIXED (agreed compromise: text lines as they appear on the source website). There is also normalized & standardized distribution on the website (see below).
[ ] bring back authorstrings in combinations in synonymy Species for tests (with multiple protonyms): https://www.dev.checklistbank.org/dataset/55378/taxon/1287 https://www.dev.checklistbank.org/dataset/55378/taxon/4133 https://www.dev.checklistbank.org/dataset/55378/taxon/2299 https://www.dev.checklistbank.org/dataset/55378/taxon/4997 https://www.dev.checklistbank.org/dataset/55378/taxon/548 https://www.dev.checklistbank.org/dataset/55378/taxon/549
Attention:
https://www.dev.checklistbank.org/dataset/55378/taxon/6807
≡ Anoploderma (Pathocerus) humboldti Lameere, 1912
= Anoploderma humboldti (Lameere, 1912) = should be no brackets
https://www.dev.checklistbank.org/dataset/55378/taxon/17280 = Logisticus oberthueri (Fairmaire, 1889) = should be no brackets (ortho variant) = Logisticus oberthurii Fairmaire, 1889
Normalized & standardized distribution (incl. ISO codes) per species is available on the website:
TITAN dev imported for tests 2023-08-29; 2023-08-30 in CLB https://www.checklistbank.org/dataset/264420/imports
Possible further improvements:
TITAN dev imported for tests 2023-08-30 in CLB https://www.checklistbank.org/dataset/264420/imports (all data have been re-crawled; distribution data have been added)
Imported to CLB-prod 2023-08-31 as TITAN of Aug 2023 / 2023-08-29 (id 1032).
[x] Species page: [x] authorstrings in synonymy; [x] distribution Sample spp: Acanthocinus aedilis (Linné, 1758) https://www.checklistbank.org/dataset/264420/taxon/10462 Alexera barii (Jekel, 1861) https://www.checklistbank.org/dataset/264420/taxon/455
[x] block names with portions "subgen.", sp., subspecies, rasse, morpha etc, see also Multi Word Epithet & Parsed Name Differs
ISSUES assessed 2023-08-31
TASKS
@gdower, it's a new "feature" (perhaps, something wrong also with IDs/parent-child relationships - I cannot block synonyms in CLB):
The same problem with subspecies: ACC-SYN infraspecies (same accepted, same authors) 328 of 1310
The same problem with duplicated synonyms vs accepted names in the report SYN-SYN species (same accepted, same authors) 16 of 48 = FIXED through blocking in CLB
Resolved 2023-08-31:
Sync 2023-08-31, 13.45-14.00 (Champaign) failed
For attention of @mdoering: Sync 2023-09-01, 7.58 (Champaign) again failed
TITAN of Aug 2023 / 2023-08-29 (id 1032) re-imported 2023-09-01.
TASKS
Resolved 2023-09-01:
Sync 2023-09-01 = failed
The last syncs claim to have been canceled, not failed: https://www.checklistbank.org/catalogue/3/sector/sync?sectorKey=1996
The logs contain this suspicous entry:
ATTACH taxon tree ACCEPTED FAMILY Cerambycidae Latreille, 1802 [Cerambycidae] to SUPERFAMILY Chrysomeloidea [b1bcfd7d-b704-40b4-bf59-0bd6484330d8]. Blocking 6009 nodes
Blocking 6009 nodes? There are ~9500 decision in place, mostly blocks. Maybe thats real then. I will look into this on Monday, no immediate idea why that is
TITAN of 2023-12-22 / 2023-12-22; imported 2024-04-23
Metrics
ISSUES assessed 2024-05-01
TASKS
Resolved 2024-05-01:
Synced 2024-05-01
TITAN DEV2 https://www.dev.checklistbank.org/dataset/55378/classification
Version crawled from the website http://titan.gbif.fr/.
Imported 2023-06-29:
Classification: