CatalogueOfLife / data

Repository for COL content
6 stars 2 forks source link

Resync old sectors #669

Open mdoering opened 3 weeks ago

mdoering commented 3 weeks ago

Some sectors were last synced in 2020 before we actually went live with the new infrastructure. This results in some track records missing, e.g. the actual source record the name came from. We should resync all these sectors, even if no data has changed, just to update the source tracking.

The following sectors have last been synced before 2021:

SELECT s.id, s.subject_dataset_key, s.sync_attempt, si.finished, si.name_count, d.alias
 FROM sector s LEFT JOIN dataset d ON d.key=s.subject_dataset_key
 LEFT JOIN sector_import si ON si.dataset_key=s.dataset_key AND si.sector_key=s.id AND si.attempt=s.sync_attempt
 WHERE s.dataset_key=3 and si.finished < '01-01-2021';

sector| subject_dataset_key | sync_attempt |          finished          | name_count |                alias                 
-----+---------------------+--------------+----------------------------+------------+--------------------------------------
  104 |                1502 |            3 | 2020-08-04 17:14:59.04932  |          1 | Animal Biodiversity
  105 |                1502 |            3 | 2020-08-04 17:15:04.582723 |          2 | Animal Biodiversity
  106 |                1502 |            3 | 2020-08-04 17:15:14.924242 |          2 | Animal Biodiversity
  107 |                1502 |            3 | 2020-08-04 17:15:20.259852 |          2 | Animal Biodiversity
   46 |                1502 |            4 | 2020-08-04 17:14:49.439559 |          1 | Animal Biodiversity
   47 |                1502 |            3 | 2020-08-04 17:14:54.64497  |          1 | Animal Biodiversity
  280 |                1118 |            6 | 2020-07-18 05:03:38.083611 |        103 | HymIS Crabronidae & Rhopalosomatidae
  521 |                1104 |            7 | 2020-07-18 09:08:50.167303 |         36 | Phoronida Database
  584 |                1033 |            6 | 2020-07-19 05:53:50.31922  |          3 | Trichomycetes
  668 |                1502 |            3 | 2020-08-04 17:15:09.778921 |          2 | Animal Biodiversity
  613 |                1033 |            6 | 2020-07-19 05:54:09.778051 |        170 | Trichomycetes
  736 |                1078 |            1 | 2020-08-13 22:01:23.916943 |        984 | Mites GSD Tenuipalpidae
  730 |                1070 |            1 | 2020-08-13 21:53:58.12986  |       2514 | Mites GSD Phytoseiidae
(13 rows)
yroskov commented 3 weeks ago

The problem is that I made a lot of cleanings for duplicated taxa. Some of them, directly in the project (for the reasons). Your suggestion to complete syncs is too late and can vanish part of my work. Such steps should be done at the beginning of the cycle.

@gdower, what do you think?

yroskov commented 3 weeks ago

HymIS Crabronidae & Rhopalosomatidae Phoronida Database Trichomycetes Mites GSD Tenuipalpidae Mites GSD Phytoseiidae = all were imported from AC19. Is re-sync safe for them, @gdower, @mdoering ?

Animal Biodiversity - is also imported from AC19. I used this resource to add few missing taxa in the management classification of Coleoptera; it has 0 species. I would suggest to wait for a new Coleoptera classification from Diana.

yroskov commented 3 weeks ago

Re-sync of HymIS Crabronidae is might be risky. If I am not wrong, I performed some cleanings for "cross duplicates" in Crabronidae directly in the project.