AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Issue with QVMAG images #1052

Closed cha801p closed 2 weeks ago

cha801p commented 2 months ago

Investigate QVMAG image issue.

cha801p commented 2 months ago

Process Documentation: Handling and Deleting Image Files Objective: This document outlines the steps to delete image files from a server, verify the deletion, and ensure data integrity by matching image counts post-deletion.

Process Overview:

  1. Download the File from S3
  2. Convert File Paths to URLs
  3. Soft Delete Images Using the URLs
  4. Manually Initiate Purge of Deleted Images
  5. Reingest Data
  6. Verification and Data Integrity Check

Detailed Steps: 1. Download the File from S3 Location: /pipelines-data/dr345/1/images-load/deletes/deleted.csv Action: Download the deleted.csv file containing the list of image id's to be deleted.

2. Convert File Paths to URLs Purpose: Convert local file paths into full URLs required for the DELETE operation. Tool: Use sed for stream editing. Command: sed -e 's/^/https:\/\/images.ala.org.au\/ws\/image\//' deleted.csv > urls.txt

3. Soft Delete from Images Method: Use xargs and curl to pass each URL from the urls.txt file for deletion. Headers: Include API key and accept header. Command: xargs -n 1 curl -v -X DELETE -H "apiKey: XXX" -H 'accept: application/json' < urls.txt

4. Go to Image Admin Tools URL: https://images.ala.org.au/admin/tools Action: Manually navigate to this URL in a web browser and access the 'Purge Deleted Images' tool. Procedure: Click on the 'Purge Deleted Images' button to initiate the purge process.

5. Reingest Data Procedure: Reingest the datset on Airflow with load_images set to true Consideration: _Loadimages: "true"

6. Verification and Data Integrity Check Post-Deletion Verification: Confirm that the example image URL does not work. Check if deleted.csv is empty. Confirm image counts match the expected results after the deletion process.

cha801p commented 2 months ago

Ticket Update: April 19, 2024 (5 PM)

Issue: Missing images

Solution: Reloaded data after deleting images

Actions Taken:

After successfully following the above process for deleting specific image files from our server an email notification has been sent to the data provider, confirming the completion of the deletion process and reingestion.

Status: Waiting for a reply from the data provider

cha801p commented 2 months ago

Ticket Update: April 22, 2024 (12:30 PM)

Issue: Missing images - email update from data provider after reloading images (following the above process)

Solution: The issues concerning images not appearing on the biocache interface have been resolved. This was initially identified as a cache-related problem. Subsequent actions involved verifying their visibility and correctness on the biocache platform.

Actions Taken:

Validation: Peggy and Raj successfully verified the visibility of the images on the biocache platform, confirming that the issues were resolved post-reload.

Status:

peggynewman commented 2 weeks ago

This was a problem with how specimen images appeared in the BIE. Slack discussion here: https://atlaslivingaustralia.slack.com/archives/C05LK6UT5D1/p1714447576793399 Fixed.