AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

NSW AVH feed - Data Refresh #1088

Closed cha801p closed 1 month ago

cha801p commented 3 months ago

Issue: Data Refresh - NSW AVH

Solution: Successfully load the dataset on biocache-test

Actions Taken:

Links: Metadata: data

Issues Encountered: The formatting of the data was odd and hence failed to read the data using UTF-8 encoding Attempted reading data using multiple encodings Experimented converting TXT file to CSV to read data - Unsuccessful SOLR_dataset_indexing FAILED with the following error: 24/07/04 02:21:12 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message. org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.

_Error from server at http://aws-solr-test-2.ala:8983/solr/biocache-2024-06-04-06-20_shard2_replica_n3: ERROR: [doc=0909f47f-11d8-4bf0-ba32-e6157f790a8f] Error adding field 'elevationSource'='Collector' msg=For input string: "Collector"_

Troubleshooting: Column elevationSource had all NaN values except the last row, which had the value Collector in it. This column was deleted to eliminate the error and the issue was reported to the Systems

Logs on Test 24/07/04 01:45:22 INFO ALAUUIDMintingPipeline: Checking the percentage change in new UUIDs: 24/07/04 01:45:22 INFO ALAUUIDMintingPipeline: newUuids: 13613.0, preservedUuids: 772750.0, orphanedUniqueKeys: 1886.0 24/07/04 01:45:22 INFO ALAUUIDMintingPipeline: Percentage UUID change: 1, allowed percentage: 50, override percentage check: false

cha801p commented 3 months ago

Issue: Data Refresh - NSW AVH

Solution: Successfully load the dataset on biocache

Actions Taken: Data review DwCA created locally File uploaded to collectory Triggered pre-ingestion

Links: Metadata: data

Stats: Old occurrence count - 772,750 records Current occurrence count - 786,363 records

Status: Waiting for images to be uploaded to images.ala Re-ingest the data once all the images are uploaded to link images to occurrences

cha801p commented 2 months ago

Ticket Update: July 17, 2024 (10:30 AM)

Issue: NSW AVH - Data encoding issue

Solution: Successfully load the dataset with UTF-8 encoding

Actions Taken:

Links: Metadata: data

Status: - The data provider has identified the issue with UTF-8 encoding on their end and is working on the data reformatting.

cha801p commented 1 month ago

Data has been reloaded and the data provider has been informed. Here is a brief conversation: https://support.ehelp.edu.au/a/tickets/205967 It has another issue linked to it which is mentioned below: https://github.com/AtlasOfLivingAustralia/data-management/issues/1072

_Note how on the following example the identifiedByIDs are linking correctly with no duplication, but the recordedByIDs are duplicated and the links for the secondary collector are also malformed:

https://avh.ala.org.au/occurrences/3d41575f-30ab-433e-a00c-eb45c95ebaf1

The problem occurs with other datasets as can be seen in this example from MEL:

https://avh.ala.org.au/occurrences/9b3081fc-cb48-44e6-a9d6-00409e6e1188_