AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

QVMAG [dr345] data refresh #1079

Open cha801p opened 1 week ago

cha801p commented 1 week ago

QVMAG has sent the updated data. (Monday, 17 June 2024 at 2:01 PM)

  1. QVMAG have conducted image and data cleaning; most low-quality and duplicate images have been removed.
cha801p commented 1 week ago

Ticket Update: June 26, 2024 (10:30 AM)

Issue: Data Refresh (https://collections.ala.org.au/public/show/dr345) The data provider has sent us cleaned data

Solution: Upload the cleaned data and fix images

Actions Taken:

Details: There were approximately 17,000 images on images.ala for dr345, with only around 4,000 associated with occurrences. Issue: The data provider sent clean data, necessitating the cleanup of images. Process Conducted: Data Review: A thorough review of the image dataset was conducted. Image Identification: 13,000 images were identified that were not present in the cleaned dataset. CSV Creation: A CSV file was created containing the 13,000 images to be deleted. Image Deletion: Images were deleted from images.ala using an API call. Purge: The 'purge deleted images' was run from the admin tools on images.ala. Data Loading: Cleaned data was loaded onto the collectory.

Logs: 24/06/24 08:52:20 INFO ALAUUIDMintingPipeline: Checking the percentage change in new UUIDs: 24/06/24 08:52:20 INFO ALAUUIDMintingPipeline: newUuids: 1394.0, preservedUuids: 119492.0, orphanedUniqueKeys: 57.0 24/06/24 08:52:20 INFO ALAUUIDMintingPipeline: Percentage UUID change: 1, allowed percentage: 50, override percentage check: false 24/06/24 08:52:20 INFO ALAUUIDMintingPipeline: Backing up existing UUIDs to hdfs:///pipelines-data/dr345/1/identifiers/ala_uuid_backup_1719219140783 24/06/24 08:52:20 INFO ALAUUIDMintingPipeline: Pipeline complete.

Status: The image cleanup process is complete. Data Loaded on prod Data provider has been informed