AtlasOfLivingAustralia / data-management

Data management issue tracking
7 stars 0 forks source link

New Data Load : TERN - Events Data dr19815 #984

Closed cha801p closed 3 months ago

cha801p commented 1 year ago

Please refer to https://github.com/AtlasOfLivingAustralia/data-management/issues/796 for previous data-related conversations and comments.

cha801p commented 1 year ago

Ticket Update: October 3, 2023 (4:45 PM)

Issue: New Data Load for TERN dr19815 - Events System

Resolution: Successfully load the data into biocache.

Actions Taken: Following our discussion on September 27th with Peggy and Doug, we decided to collaborate on column name mapping. During this time, I created a Python Notebook to demonstrate all data operations for improved data representation. This involved the following steps:

  1. Data retrieval through API.
  2. Renaming columns and mapping them to DwC terms for both events and observations.
  3. Generating locationID links.
  4. Checking for duplicates and removing them.
  5. Saving all data to CSV, including location information that was previously omitted.
  6. Generating meta.xml.
  7. Generating a DwCA

This data was loaded onto databox, and we ran the Elastic_dataset_indexing DAG to load the data into the events system.

Next Steps:

Discussion Points:

  1. Is the data structure correct?
  2. In the previous code, we were merging site information with observation data. Is this still necessary?

Key Outcomes: After reloading the data, location information is now visible in the Events system. Additionally, Sites and Maps have been updated for TERN in the Events system.

peggynewman commented 10 months ago

Peggy to review Additional Properties and event types

Also note: mapping between ABIS and DwC from TERN: https://linkeddata.tern.org.au/information-models/tern-ontology/cookbook/darwin-core-occurrence

cha801p commented 9 months ago

Ticket Update: December 14, 2023 (8 PM)

Issue: New event - dataset to load on prod.

Solution: Successfully load the new dataset into biocache.

Actions Taken:

Communication: The data provider was informed regarding Raj's leave (expecting test links soon).

Status: Work Pending

cha801p commented 8 months ago

Status: Databox link sent to data provider, waiting for approval Next Step: Once verified by the data provider load the data on Prod