AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

Data load: Logan City Council Species Sightings #1043

Closed timhicks-ala closed 2 months ago

timhicks-ala commented 3 months ago

From https://support.ehelp.edu.au/a/tickets/197899

Existing data resource (2,409 records): https://collections.ala.org.au/public/show/dr2592

The newly supplied file has ~680 records, so likely it is an additional update rather than the whole dataset.

A new metadata file has also been provided.

cha801p commented 3 months ago

Ticket Update: March 26, 2024 (5 PM)

Issue: Data Refresh

Solution: Successfully load the new dataset into biocache

Actions Taken: Successfully loaded the data on test

Loaded data for review: Metadata: data

Logs: _INFO [2024-03-21 06:30:35,144+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Running the pipeline INFO [2024-03-21 06:30:36,071+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Checking the percentage change in new UUIDs: INFO [2024-03-21 06:30:36,073+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: newUuids: 686.0, preservedUuids: 2409.0, orphanedUniqueKeys: 0.0 INFO [2024-03-21 06:30:36,073+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Percentage UUID change: 22, allowed percentage: 50, override percentage check: false INFO [2024-03-21 06:30:36,073+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Backing up existing UUIDs to /data/pipelines-data/dr2592/1/identifiers/ala_uuid_backup1711002636073 INFO [2024-03-21 06:30:36,073+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Pipeline complete.

Status: Waiting for confirmation from the data provider

cha801p commented 3 months ago

Ticket Update: April 3, 2024 (5 PM)

Issue: Data Refresh

Solution: Successfully load the new dataset into biocache

Actions Taken: Successfully loaded the data on biocache

Data review Columns renamed - occurrenceID to catalogNumber DwcA created locally Loaded the data on collectory Ingest_small_dataset kept failing Reingested the data

Problems encountered: Incremental load was set to True but was not reflecting on collectory This created orphaned records as dwca-imports was replaced by new dwca (with only new records) Old dwca was replaced on s3 and re-ran the preingestion

Loaded data for review: Metadata: data

Logs: INFO [2024-04-03 03:27:49,897+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Create ALAUUIDRecords and write out to AVRO INFO [2024-04-03 03:27:49,926+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Running the pipeline INFO [2024-04-03 03:27:50,808+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Checking the percentage change in new UUIDs: INFO [2024-04-03 03:27:50,809+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: newUuids: 686.0, preservedUuids: 2409.0, orphanedUniqueKeys: 0.0 INFO [2024-04-03 03:27:50,810+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Percentage UUID change: 22, allowed percentage: 50, override percentage check: false INFO [2024-04-03 03:27:50,810+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Backing up existing UUIDs to /data/pipelines-data/dr2592/1/identifiers/ala_uuid_backup_1712114870810 INFO [2024-04-03 03:27:50,810+0000] [main] au.org.ala.pipelines.beam.ALAUUIDMintingPipeline: Pipeline complete.INFO [2024-04-03 03:27:50,810+0000

Status: Waiting for indexing

cha801p commented 3 months ago

Status: Review links sent to the data provider

peggynewman commented 2 months ago

Went to prod last week. Is this good to close @cha801p?

cha801p commented 2 months ago

@peggynewman Review links have already been sent to the data provider just waiting for confirmation. I will send a follow-up email today.