CatalogueOfLife / testing

Editorial tests and discussion to prepare for COL releases
2 stars 0 forks source link

TITAN (id 1032): test report #231

Open yroskov opened 1 year ago

yroskov commented 1 year ago

TITAN DEV2 https://www.dev.checklistbank.org/dataset/55378/classification

Version crawled from the website http://titan.gbif.fr/.

Imported 2023-06-29:

image

Classification:

yroskov commented 1 year ago

TITAN DEV3 https://www.dev.checklistbank.org/dataset/55400/classification

Imported 2023-06-30:

image

yroskov commented 12 months ago

(TITAN DEV3)

image

...there are no type subspecies in CLB checklist version due to presence of "=" on subspecies page (such pages were excluded, because they create incorrect species placements in genera):

In TITAN (http://titan.gbif.fr/sel_synonyme.php?numero=12081&id_nom_synonyme=305904&nom_synonyme=Acalolepta%20(Dihammus)%20rusticator%20rusticator): image

image

yroskov commented 12 months ago

SUGGESTION: ignore portions in brackets during conversion (i.e. no subtribes)

(Example: tribe Cerambycina & tribe: Cerambycinae)

In TITAN, the list of tribes: image

In CLB: image

Other cases:

In TITAN: image

in CLB: image

In TITAN: image

in CLB: image

In TITAN: image

in CLB: image

In TITAN: image

image

image

image

yroskov commented 12 months ago

tribe nomina nuda with 7 genera = species not present in CLB tribe Fossiles with 33 genres - it present in TITAN = not in CLB. In theory, these 33 genera (http://titan.gbif.fr/sel_tribu.php) should be attached directly to the family Cerambycidae, but we may get split genera. Thus, species should be attached directly to the family Cerambycidae (i.e. without parent genus) with "Accepted" status.

yroskov commented 12 months ago

Example: Aegomorphus pereirai (Prosen & Lane, 1955)

In CLB: https://www.dev.checklistbank.org/dataset/55400/taxon/121

image

In TITAN: http://titan.gbif.fr/sel_genann1.php?numero=121 image

yroskov commented 12 months ago
yroskov commented 12 months ago

TITAN DEV4 https://www.dev.checklistbank.org/dataset/56387/classification

Imported 2023-07-11, 1:39 PM

image

image

yroskov commented 12 months ago

TITAN DEV4 https://www.dev.checklistbank.org/dataset/56387/classification - new iteration

Imported 2023-07-11, 5:37 PM

image

yroskov commented 12 months ago

TITAN DEV4 https://www.dev.checklistbank.org/dataset/56387/classification - next iteration

Imported 2023-07-12, 4:04 PM

image

yroskov commented 12 months ago

TITAN DEV4 https://www.dev.checklistbank.org/dataset/56387/classification - new iteration

Imported 2023-07-12, 8:29 PM

image

NEXT STEPS IN PRODUCTION:

yroskov commented 11 months ago

TITAN DEV4 https://www.dev.checklistbank.org/dataset/56387/classification - another iteration

Imported 2023-07-12, 11:31 PM

NEXT STEPS IN PRODUCTION:

yroskov commented 11 months ago

Copy of ACCESS database received 2023-07-25.

yroskov commented 11 months ago

TITAN of Jun 2023 / 2023-06-26 imported to the production CLB 2023-07-13

image

ISSUES assessed 2023-08-08

image

TASKS

image

Resolved 2023-08-08:

image

Sync started 2023-08-08 Sync failed: image

yroskov commented 11 months ago

@olafbanki, sync of new TITAN data failed due to "OTHER" license. Indeed, metadata says "License: Other" I didn't find TITAN in the log of license negotiations https://github.com/CatalogueOfLife/testing/issues/30 Could you please confirm the license for TITAN? (Data are ready for August edition, but sync is blocked by the CLB due to license mismatch).

yroskov commented 10 months ago

@olafbanki, 2023-08-15: With the removal, I assume also the metadata changed; as I explicitly changed the licensing of TITAN for the Annual Checklist 2023. TITAN should be CC-BY. This agreement was reached with Thierry Bourgoin, after we and him did not manage to get in contact with Gerard (contact not established since 2021).

Synced 2023-08-17

yroskov commented 10 months ago

TITAN of Jun 2023 / 2023-06-26 imported to the DEV 2023-08-23 8:48 PM as titan dev2 after adjustments in the crawler script

Tested 2023-08-24 at https://www.dev.checklistbank.org/dataset/55378/classification

image

It was a month ago (view in the portal): image

It is now (view in the DEV CLB): image

yroskov commented 10 months ago

https://www.dev.checklistbank.org/dataset/55378/taxon/6807 ≡ Anoploderma (Pathocerus) humboldti Lameere, 1912
= Anoploderma humboldti (Lameere, 1912) = should be no brackets

https://www.dev.checklistbank.org/dataset/55378/taxon/17280 = Logisticus oberthueri (Fairmaire, 1889) = should be no brackets (ortho variant) = Logisticus oberthurii Fairmaire, 1889

yroskov commented 10 months ago

Normalized & standardized distribution (incl. ISO codes) per species is available on the website: image

http://titan.gbif.fr/sel_pays1.php?numpays=4133: image

yroskov commented 10 months ago

TITAN dev imported for tests 2023-08-29; 2023-08-30 in CLB https://www.checklistbank.org/dataset/264420/imports

image

yroskov commented 10 months ago

Possible further improvements:

yroskov commented 10 months ago

TITAN dev imported for tests 2023-08-30 in CLB https://www.checklistbank.org/dataset/264420/imports (all data have been re-crawled; distribution data have been added)

Imported to CLB-prod 2023-08-31 as TITAN of Aug 2023 / 2023-08-29 (id 1032).

image

ISSUES assessed 2023-08-31

image

TASKS

image

@gdower, it's a new "feature" (perhaps, something wrong also with IDs/parent-child relationships - I cannot block synonyms in CLB):

The same problem with subspecies: ACC-SYN infraspecies (same accepted, same authors) 328 of 1310

The same problem with duplicated synonyms vs accepted names in the report SYN-SYN species (same accepted, same authors) 16 of 48 = FIXED through blocking in CLB

Resolved 2023-08-31:

image

Sync 2023-08-31, 13.45-14.00 (Champaign) failed

yroskov commented 10 months ago

https://github.com/CatalogueOfLife/backend/issues/1246

yroskov commented 10 months ago

For attention of @mdoering: Sync 2023-09-01, 7.58 (Champaign) again failed

yroskov commented 10 months ago

TITAN of Aug 2023 / 2023-08-29 (id 1032) re-imported 2023-09-01.

TASKS

image

Resolved 2023-09-01:

image

Sync 2023-09-01 = failed

mdoering commented 10 months ago

The last syncs claim to have been canceled, not failed: https://www.checklistbank.org/catalogue/3/sector/sync?sectorKey=1996

The logs contain this suspicous entry:

ATTACH taxon tree ACCEPTED FAMILY Cerambycidae Latreille, 1802 [Cerambycidae] to SUPERFAMILY Chrysomeloidea [b1bcfd7d-b704-40b4-bf59-0bd6484330d8]. Blocking 6009 nodes

Blocking 6009 nodes? There are ~9500 decision in place, mostly blocks. Maybe thats real then. I will look into this on Monday, no immediate idea why that is

yroskov commented 2 months ago

TITAN of 2023-12-22 / 2023-12-22; imported 2024-04-23

Metrics

image

ISSUES assessed 2024-05-01

image

TASKS

image

Resolved 2024-05-01:

image

Synced 2024-05-01