AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Dataset update: Canberra Garden Bird Survey #976

Closed timhicks-ala closed 3 weeks ago

timhicks-ala commented 9 months ago

Helpdesk ticket: https://support.ehelp.edu.au/a/tickets/184688

CGBS are keen to provide an updated dataset to the ALA and have sent a data sample to be evaluated. Details have been forwarded to Raj to look into.

cha801p commented 9 months ago

Ticket Update: September 26, 2023 (5:30 PM)

Issue Resolved: Data Refresh for Canberra Garden Bird Survey

Resolution: Load data on biocache

Actions Taken:

  1. Initially, unique identifiers were generated using scientificName, LocationID, and eventDate.
  2. It was confirmed with the data provider that the new data load contained significant changes in scientificNames and locationIDs.
  3. A new column called occurrenceID was introduced by the data provider, serving as the new unique identifier for this data load and future ones.
  4. This resulted in a 100% change in UUIDs.
  5. To accommodate this change, the "override_uuid_percentage_check" parameter in Preingest_datasets DAG was set to "true."
  6. The column occurrenceId was renamed to occurrenceID.
  7. Data was first loaded onto databox for testing purposes.
  8. The same data loading process was applied to load data on prod.

Validation:

Logs snapshot (Prod) 23/09/26 03:58:31 INFO ALAUUIDMintingPipeline: Checking the percentage change in new UUIDs: 23/09/26 03:58:31 INFO ALAUUIDMintingPipeline: newUuids: 1855916.0, preservedUuids: 0.0, orphanedUniqueKeys: 1655345.0 23/09/26 03:58:31 INFO ALAUUIDMintingPipeline: Percentage UUID change: 100, allowed percentage: 50, override percentage check: true